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ENGINEERING  APPLICATION  OF  NEURAL  COMPUTING: 
A  STATE-OF-THE-ART  SURVEY 

1  INTRODUCTION 


1.1  Background 

In  the  past  5  years,  neural  networks,  growing  out  of  McCulloch  and  Pitts’  computational  model  of 
neuron  (McCulloch  and  Pitts,  1943)  and  propelled  by  the  works  of  Hopfield  (Hopfield,  1982)  and 
Parallel  Distributed  Processing  (PDP)  research  group  (Rumelhart  and  McClelland,  1986),  have 
attracted  tremendous  enthusiasm  and  research  interest  from  computer  scientists,  neurophysiologists, 
and  engineers.  This  remarkable  phenomenon  has  paralleled  research  in  Artificial  Intelligence  (AI)  of 
the  1970s.  This  new  interest  is  supported  by  the  realization  that  neural  computing  is  inherently  parallel 
and  functionally  more  close  to  the  operation  of  the  brain;  that  is,  it  has  the  capability  of 
self-organization  or  learning.  In  addition  to  the  recognition  of  the  capability  of  neural  networks  and 
the  development  of  computing  technology,  other  factors  have  contributed  to  the  explosion  of  interest 
in  this  area:  1)  it  is  a  universal  approximator  or  it  is  computationally  complete,  which  means  that  an 
appropriate  neural  network  with  appropriate  training  rules  has  the  capability  of  solving  virtually  any 
computational  tasks;  2)  it  takes  a  middle  ground  between  traditional  mathematical  approach  and 
symbolic  AI  approach  by  using  numerical  methods  for  learning  and  expansive  representation 
schemes,  as  well  as  adopting  a  functional  use  of  experimental  knowledge;  3)  it  provides  an  alternative 
with  efficient  performance  in  solving  problems  that  are  currently  difficult  for  a  conventional  approach, 
such  as  speech  and  natural  language  process,  vision  and  image  analysis,  and  pattern  recognition  with 
the  recent  insight  into  algorithms  that  improve  the  learning  ability  of  a  neural  network;  4)  it  may 
provide  some  insight  into  the  computational  characteristics  of  the  brain;  and  5)  it  is  also  intrinsically 
feasible  for  the  implementation  of  neural  networks  to  massively  parallel  hardwares  (Aleksander,  1989; 
Barto,  1989). 

Research  interest  and  the  increasing  funding  in  neural  networks  have  generated  numerous  kinds  of 
architecture  and  learning  paradigms.  With  the  advance  and  sophistication  in  some  branches  of  neural 
computing,  the  technology  has  been  successfully  tailored  for  a  wide  range  of  applications,  such  as  the 
modeling  of  cognitive  process,  language  understanding,  and  pattern  recognition,  as  illustrated  by  the 
large  range  of  subjects  covered  in  papers  appearing  in  conferences  on  neural  networks  (IEEE,  1987, 
1988;  NIPS,  1988,  1989;  IJCNN,  1989,  1990).  To  facilitate  the  development  and  application  of  this 
emerging  technology,  National  Science  Foundation  (NSF)  also  established  the  Neural  Engineering 
Program  in  conjunction  with  other  funding  agencies  such  as  Defense  Advance  Research  Projects 
Agency  (DARPA),  Office  of  Naval  Research  (ONR),  and  National  Aeronautics  and  Space 
Administration  (NASA).  It  is  obvious  that  with  the  advance  and  development  of  hardware  and 
software  of  neural  networks,  this  new  technology  will  potentially  solve  some  difficult  engineering 
pioblems.  Research  in  application  of  neural  computing  to  engineering  problems,  especially  civil 
engineering  problems  related  to  U.S.  Army  Construction  Engineering  Research  Laboratory 
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(USACERL)  research,  is  still  in  its  fledgling  stage.  Therefore,  it  is  imperative  to  discern  and  evaluate 
the  state-of-the-art  of  neural  computing  technology  and  its  practical  applications  in  order  to  provide 
insight  into  the  potential  opportunities  and  benefits  in  USACERL-related  research. 

1.2  Objectives  and  Approach 

The  main  objective  of  this  report  is  to  provide  a  series  of  short  descriptions  of  bow  neural  networks 
have  been  used  in  fields  related  to  USACERL  research  and  to  provide  extensive  references  on  seminal 
works  in  neural  computing.  The  approach  was  as  follows: 

•  Review  new  development  and  current  research  on  different  types  of  neural  networks  and 
learning  paradigms. 

•  Review  publications  on  neural  networks  applications  to  engineering  problems  related  to 
and/or  of  interest  to  USACERL  research  projects.  Identify  the  methodologies  of  neural 
computing  to  different  applications  and  evaluate  their  potential  for  research  projects  in 
USACERL. 

•  Provide  a  comprehensive  set  of  bibliographic  references  on  seminal  and  representative 
publications  in  neural  computing. 

References 

1.  Proceeding?  of  the  IEEE  First  International  Conference  on  Neurcd  Networks ,  Institute  of 
Electrical  and  Electronic  Engineers  (IEEE),  New  York,  June  1987. 

2.  Proceedings  of  the  IEEE  International  Conference  on  Neural  Networks,  Institute  of  Electrical 
and  Electronic  Engineers  (IEEE),  New  York,  June  1988. 

3.  Proceedings  of  the  1 988  Connectionist  Models  Summer  School,  D.  Tburetzsky,  G.  Hinton,  and  T 
Sejnowski  (Eds.),  Carnegie  Mellon  University,  Morgan  Kaufmann  Publishers,  San  Mateo,  CA, 
1989. 

4.  Proceedings  of  the  1 988  IEEE  Conference  on  Neural  Information  Processing  Systems  -  Natural 
and  Synthetic:  Advances  in  Neural  Information  Processing  Systems  1,  D.  S.  Touretzky  (Ed.), 
Morgan  Kaufmann  Publishers,  San  Mateo,  CA,  1989. 

5.  Proceedings  of  the  1989  IEEE  Conference  on  Neural  Information  Processing  Systems  -  Natural 
and  Synthetic:  Advances  in  Neural  Information  Processing  Systems  2,  D.  S.  Touretzky  (Ed.), 
Morgan  Kaufmann  Publishers,  San  Mateo,  CA,  1990. 

6.  Proceedings  of  the  international  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE 
and  the  International  Neural  Network  Society  (INNS),  Washington,  D.  C.,  1989. 

7.  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE 
and  the  International  Neural  Network  Society  (INNS),  Washington,  D.  C.,  1990. 
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8.  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE 
and  the  International  Neural  Network  Society,  San  Diego,  1990. 

9.  Proceedings  of  the  Third  International  Conference  on  Genetic  Algorithms,  J.  D.  Schaffer  (Ed.). 
Morgan  Kaufmann  Publishers,  San  Mateo,  CA,  1990. 

10.  AIP  Conference  Proceedings  1 51:  Neural  Networks  for  Computing,  J.  S.  Denker  (Ed.),  American 
Institute  of  Physics  (AIP),  Snowbird,  UX  1986. 

11.  Aleksander,  I.  (Ed.),  Neural  Computing  Architecture,  The  MIT  Press,  Cambridge,  MA,  1989. 

12.  Barto,  A.  G.,  Connectionist  Learning  for  Control:  An  Overview ,  COINS  Technical  Report  89-89, 
Dept,  of  ComDUter  and  Information  Science,  University  of  Massachusetts,  Amherst,  MA, 
1989. 

13.  Hopfield,  J.  J.,  “Neural  Networks  and  Physical  Systems  with  Emergent  Collective 
Computational  Abilities,”  Proceedings  of  the  National  Academy  of  Sciences,  79,  2554-2558, 
1982. 

14.  McCulloch,  W.  S.,  and  Pitts,  W.,  “A  Logical  Calculus  of  the  Ideas  Immanent  in  Nervous 
Activity,”  Bulletin  of  Mathematical  Biophysics,  5,  pp.  115-133, 1943. 

15.  Rumelhart,  D.,  and  McClelland,  J.  (Eds.),  Parallel  Distributed  Processing:  Explorations  in  the 
Microstructure  of  Cognition;  Vol.  1:  Foundations,  The  MIT  Press,  Cambridge,  MA,  1986. 
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2  NEURAL  NETWORKS 


2.1  Introduction 


Neural  networks,  also  referred  as  Parallel  Distributed  Processing  systems  (PDP)  and  Connection- 
ist  systems,  are  computational  methods  inspired  or  loosely  modelled  after  the  structure  and  internal 
information  processing  operations  of  the  human  brain.  In  general,  according  to  Rumelhart,  Hinton 
and  McClelland  (1986),  a  neural  network  is  made  up  with  the  following  components: 


•  Processing  Units  (neurons):  the  functions  of  each  neuron  are  to  receive  signals,  perform 
simple  computation,  and  send  out  signals  through  ongoing  connections. 

•  Connections:  each  connection  between  neurons  functions  like  a  multiplicative  filter  with 
its  connection  strength  or  weight. 

•  Rules  of  Activation  Propagation. 

•  Rules  of  Learning. 

•  Network  Architecture  (Tbpology). 


Based  on  the  composition  of  the  network  topology,  the  form  of  activation  functions  and  the  rule  of 
learning,  different  kinds  of  neural  networks  have  been  developed.  Some  of  the  well-known  networks 
are  Perceptrons  (Rosenblatt,  1962),  Adaline  and  Madaline  (Windrow  and  Hoff,  1960),  Hopfield  net¬ 
work  (Hopfield,  1982),  Boltzman  Machine  (Hinton,  et  al.,  1983, 1984),  Kohonen  self-organizing  net¬ 
work  (Kohonen,  1984),  the  competitive  learning  network  (Grossberg,  1976;  Rumelhart  and  Zipser, 
1986),  the  Adaptive  Resonance  Theory  (ART)  (Carpenter  and  Grossberg,  1987),  the  recurrent  type 
networks  (Jordan,  1986;  Elman,  1988;  Williams  and  Zipser,  1989),  and  the  Backpropagation  networks 
(Rumelhart,  et  al.,  1986;  Parker,  1982;  Warbos,  1974).  In  general,  learning  mechanisms  can  be  catego¬ 
rized  into  three  forms:  supervised  learning,  unsupervised  learning,  and  reinforced  learning  (Hinton, 
1989). 

One  of  the  important  features  pertaining  to  neural  networks  is  their  capability  of  self-organization 
or  learning.  When  training  a  neural  network,  it  is  presented  with  examples  or  data  of  the  concept  to 
capture.  It  then  internally  modifies  its  interconnection  strength  or  weight  of  connections  through  the 
rule  of  learning.  After  completion  of  the  training  session,  the  knowledge  is  stored  in  the  pattern  of 
connection  strengths  of  the  network. 

In  the  following  paragraphs,  different  types  of  neural  networks  and  their  characteristics  are  de¬ 
scribed.  Relevant  references  are  also  included.  It  should  be  kept  in  mind  that  the  references  listed  here 
are  by  no  means  exhaustive.  Our  intention  is  to  provide  pointers  for  those  interested  in  neurocomput¬ 
ing  application  to  fundamental  works  and  noticeable  achievements  as  well  as  the  state-of-the-art  re¬ 
search  in  different  branches  of  connectionist  systems. 
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2.2  Feedforward  Multilayer  Neural  Networks 

Feedforward  neural  networks,  developed  from  Perceptron  (Rosenblatt,  1958),  are  also  referred  to 
as  multilayer  perceptrons.  It  may  be  claimed  that  the  revival  of  neural  networks  research  is  closely 
related  to  the  development  of  backpropagation  neural  networks  and  the  famous  Generalized  Delta 
Rule  (Rumelhart,  et  al.,  1986).  Because  of  their  simplicity  in  learning  and  architecture  construction,  as 
well  as  a  sense  of  control  over  the  training  process,  backpropagation  neural  networks  have  been  widely 
used  in  most  of  the  applications  involved  with  functional  representation  and  mapping.  However,  one 
of  the  drawbacks  associated  with  multilayer  feedforward  networks  is  their  slow  convergence  rate  in 
learning  and  the  lack  of  a  priori  determination  of  architecture  and  the  use  of  a  priori  knowledge. 

Recently,  several  approaches  have  been  proposed  to  improve  the  performance  of  backpropagation 
neural  networks.  In  general,  there  are  these  approaches  to  the  problem:  1)  using  a  better  representa¬ 
tion  scheme  for  input  and  output,  2)  employing  higher  order  learning  algorithms  other  than  the  gradi¬ 
ent  descent  method,  such  as  the  quasi-Newton  methods,  3)  applying  numerical  techniques  to  prepro¬ 
cess  the  input  pattern  to  introduce  independency  into  the  input  space,  4)  designing  innovative  training 
schemes  so  that  certain  knowledge  is  preoriented  in  the  network  before  final  training  session,  5)  incor¬ 
porating  network  geometry  adaptation  with  efficient  learning  algorithm  to  build  a  robust  modeling 
environment,  and  6)  determining  the  architecture  and  training  with  heuristic  rules.  The  following 
paragraphs  describe  perceptron,  backpropagation  networks,  and  their  variants. 
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2.2.1  Perceptron 


The  elementary  perceptron  is  a  two-layer  feedforward  neural  networks  consisting  of  the  input  layer 
and  output  layer  with  the  input  units  being  fully  connected  to  output  units  (Rosenblatt,  1958).  The 
network  is  basically  a  heteroassociative,  nearest-neighbor  pattern  matcher  in  that  it  maps  input 
patterns  presented  at  the  input  layer  straightforwardly  to  output  patterns  at  output  layer.  Connection 
weights  in  a  perceptron  are  adjusted  using  the  Perceptron  Convergence  Theorem  (Rosenblatt,  1962)  or 
the  Delta  Rule,  and  the  activation  function  is  the  standard  step  function.  Rosenblatt  (1962)  has  shown 
that  the  perceptron  can  solve  a  large  number  of  linear  mapping  problems.  However,  due  to  the  lack  of 
a  hidden  layer  for  intermediate  relation  representation,  the  perceptron  failed  to  handle  nonlinear 
separable  problems  such  as  the  encoding  of  the  exclusive-or  (XOR)  function,  and  has  poor 
generalization  capability.  The  critical  analysis  of  perceptron  by  Minsky  and  Papert  (1969)  with  regard 
to  its  limited  mapping  capability  and  the  lack  of  powerful  training  algorithm  for  multilayer 
perceptrons  at  that  time  in  some  way  halted  the  development  in  neural  network  research  in  the  70’s. 

The  training  of  a  perceptron  is  an  iterative  process  and  the  algorithm  is: 

•  Randomly  initialize  the  weights  and  threshold  value  at  each  units; 

•  Compute  the  output  at  each  output  unit: 

Oj  -  f  (XWU  ai-0j  ) 

where  f  is  a  step  activation  function,  0j  the  threshold  value,  aj  the  input  activation,  oj  the  output 
activation,  and  Wjj  represents  connection  strength  between  input  node  i  and  output  node  j. 

•  Adjust  weight  using  the  Delta  Rule:  Awjj  =  t]  (tj-Oj)  a; ,  where  T|  is  the  learning  rate  and 
tj  is  the  expected  output  and  Oj  is  the  network  prediction. 


Figure  2.1  -  A  Sample  Perceptron 


Because  of  the  limited  learning  capability,  the  perceptron  is  mostly  suited  for  linear  mapping 
problems  with  binary  outputs.  Recent  research  has  shown  that  the  capability  of  the  perceptron  can  be 
improved  by  incorporating  the  mechanism  of  higher  order  connectivity  or  higher  order  correlations, 
and  fuzzy  logic  (Maxwell,  Giles  and  Lee,  1986;  Keiler  and  Hunt,  1985). 
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2.2.2  Backpropagation  Neural  Networks 

Backpropagation  neural  networks  are  multilayer  feedforward  networks  with  the  Generalized  Del¬ 
ta  Rule  as  their  learning  rule  (Parker,  1982;  Rumelhart,  et  al.,  1986;  Warbos,  1974).  Learning  in  a  back- 
propagation  network  is  supervised  learning  which  means  that  the  expected  output  is  included  in  the 
training  data  that  the  network  is  supposed  to  learn.  The  architecture  for  all  backpropagation  networks 
is  in  a  layered  form  consisting  of  input  layer,  output  layer,  and  one  or  more  hidden  layers. 

The  training  process  via  the  generalized  delta  rule  is  an  iterative  process.  Each  training  cycle  in¬ 
cludes  two  sessions:  forward  propagation  of  signals  from  input  to  output  layer;  and  backward  propa¬ 
gation  of  error  signals  seen  at  the  output  layer.  Thus  each  cycle  involves  the  determination  of  error 
associated  with  each  output  units  and  the  modification  of  weights  on  the  network  connections.  The 
learning  capacity  of  a  backpropagation  network  depends  on  the  number  of  nodes  in  the  hidden  layer(s) 
and  the  pattern  of  connections  between  nodes  in  adjacent  layers  (Homik,  et  al.,  1989). 

A  feedforward  network  computation  with  these  backpropagation  neural  networks  proceeds  as  fol¬ 
lows: 

1)  The  units  in  the  input  layer  receive  their  activations  in  the  form  of  an  input  pattern  and  this 
initiates  the  feedforward  process, 

2)  The  processing  units  in  each  layer  receive  outputs  from  other  units  and  perform  the  following 
computations: 
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a)  Compute  their  net  input  Nj, 

M 

Nj  =  2  *>jk  ok 

*=i 

Ok  =  output  from  units  impinging  on  unit  j, 

M  =  number  of  units  impinging  on  unit  j. 

b)  Compute  their  activation  values  from  their  net  input  values, 
a}  =  FjiNj) 

Fj  is  usually  a  sigmoidal  function. 

c)  Compute  their  outputs  from  their  activation  values.  In  the  neural  network  type  used  in 
this  study  the  output  is  the  same  as  the  activation  value. 

oj  =  a} 

3)  The  output  values  are  sent  to  other  processing  units  along  the  outgoing  connections. 

4)  This  process  continues  until  the  processing  units  in  the  output  layer  compute  their  activation 
values.  These  activation  values  are  the  output  of  the  neural  computations. 

The  modification  of  the  strengths  of  the  connections  in  the  Generalized  Delta  Rule,  described  in 
(Rumelhart,  et  al.,  1986),  is  accomplished  through  the  gradient  descent  on  the  total  error  in  a  given 
training  case. 

A  Wij  =  ri  6j  oj 

In  this  equation,  V  -  a  learning  constant  called  the  “learning  rate”  and  <5/  =  gradient  of  the  total 
error  with  respect  to  the  net  input  at  unit  j.  At  the  output  units,  <5 j  is  determined  from  the  difference 
between  the  expected  activations  tj  and  the  computed  activations  aj: 

<5;  =  (tj-aj)  F'(Nj) 

where  F  is  the  derivative  of  the  activation  function. 

At  the  hidden  units  the  expected  activations  are  not  known  a  priori.  The  following  equation  gives  a 
reasonable  estimate  of  <5;  for  the  hidden  units: 

«/  -  (f>  F(N f) 

*=1 

In  this  equation,  the  error  attributed  to  a  hidden  unit  depends  on  the  error  of  the  units  it  influences. 
The  amount  of  error  from  these  units  attributed  to  the  hidden  unit  depends  on  the  strength  of  connec¬ 
tion  from  the  hidden  unit  to  those  units;  a  hidden  unit  with  a  strong  excitatory  connection  to  a  unit 
exhibiting  error  will  be  “blamed”  for  this  error,  causing  this  connection  strength  to  be  reduced. 

Up  to  now,  backpropagation  neural  networks  have  been  utilized  in  most  neurocomputing  applica¬ 
tions  due  to  their  robustness  in  learning.  Problems  suitable  for  using  backpropagation  networks  usu¬ 
ally  have  the  following  features:  1)  Certain  relationships  exist  between  the  input  and  output  variables, 
2)  A  comprehensive  set  of  data  from  tests  or  experiments  is  available,  and  3)  The  knowledge  to  capture 
is  included  in  the  experimental  data.  A  general  architecture  for  backpropagation  neural  networks  is 
shown  in  Figure  2.2. 
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Figure  2.2  -  A  Sample  Backpropagation  Neural  Network 


From  our  discussion  on  the  mechanism  of  backpropagation  network,  the  main  tasks  involved  in 
using  the  network  are:  1)  determination  of  architecture  and  2)  learning  algorithms  for  the  training  of 
architectures.  Though  trial  and  error  might  have  worked  for  certain  simple  problems  in  architecture 
determination,  new  and  adaptive  schemes  are  required  in  order  to  tackle  real  world  complex  problems 
with  efficiency  and  elegance.  In  the  following  paragraph,  some  of  the  new  approaches  which  in  one 
way  or  the  other  made  some  improvement  on  the  standard  backpropagation  algorithm  are  succinctly 
described. 
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2.2.3  Gram-Schmidt  Backpropagation  Network 

It  is  realized  that  the  training  of  a  backpropagation  network  will  be  much  easier  and  the  conver¬ 
gence  rate  will  be  faster  if  the  input  vectors  are  least  correlated,  that  is,  orthogonal  or  independent. 
Based  on  this  observation,  Orfanidis  (1990)  proposed  the  Gram-Schmidt  Neural  Nets  by  inserting  a 
Gram-Schmidt  preprocessor  at  each  layer  in  a  regular  backpropagation  network.  To  store  the  prepro¬ 
cessed  input  vector  impinging  at  each  layer,  an  additional  vector  Zn  is  required.  The  general  architec¬ 
ture  of  the  Gram-Schmidt  network  is  shown  is  Fig.  2.3. 

The  decorrelation  process  on  the  input  vector  X  using  the  Gram-Schmidt  algorithm  proceeds  as 
follows:  for  i  =  1,  2, ....  M  (number  of  units  in  a  layer) 

i-l 

Zi  =  Xi  -  XbU  zj 

j=l 

or  in  matrix  form,  X  =  B  Z,  where  B  is  a  unit  lower  triangular  matrix. 

The  training  algorithm  in  the  standard  backpropagation  mode  is  as  follows: 

1)  Initialize  the  G  weight  matrix  with  random  value  and  calculate  the  initial  B  matrix. 

2)  Feedforward  computation:  for  k=  1,  2, . . . ,  N  (#  of  layers) 

2.1)  Solve  Bk  Zk  =  Xk  for  Zk  through  forward  substitution, 

2.2)  Calculate Gk+1  =  GkZkandXk+1  =  f(Uk+1),wherefisavectorcontainingsigmoid 
function  evaluated  with  Uk+1. 

3)  Error  Calculation 
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layer  k  layer  k+ 1 


Uk,  Xk,  Zk  Uk+1,  Xk+1, 


Here  Uk,  Xk  denote  the  input  and  output 
vector  at  each  layer,  and  Zk  denotes  the 
Gram-Schmidt  preprocessed  vector  at 
layer  k. 


Zk+1 


Figure  2 3  -  A  Sample  Two  Layers  in  a  Gram-Schmidt  Network 


3.1)  At  the  output  layer:  e**  -  D1^  (d  -  XN),  where  D1*  =  diag  {  f(uN)  }. 

3.2)  At  the  hidden  layers:  for  k=  N-l,  N-2, . . . ,  2 

Solve  BkT  tk  =  GkT  ek+1  for  tk  through  backward  substitution 
and  calculate  ek  =  Dk  tk. 

4)  Weight  Update 

At,*  =  p Zjk Zjk and  Ag;jk  =  p,ejk+1zjk 

It  should  be  noticed  that  the  effectiveness  of  using  the  Gram-Schmidt  nets  depends  on  the 
eigenvalue  spread  of  the  covariance  matrix  R  of  the  input  pattern  X°,  where  R  is  defined  as: 

R  =  2  x°x0T> 

patterns 

If  the  ratio  of  the  largest  eigenvalue  to  the  smallest  eigenvalue  of  matrix  R  is  large,  the  Gram- 
Schmidt  preprocessor  will  be  very  effective. 


References 

1.  Golub,  G.  H.,  and  Van  Loan,  C.  F.,  Matrix  Computations,  The  Johns  Hopkins  University  Press, 
Baltimore,  Maryland,  1983. 

2.  Orfanidis,  S.  J.,  “Gram-Schmidt  Neural  Nets,”  Neural  Computation  2,  pp.  116-126, 1990. 

3.  Orfanidis,  S.  J.,  Optimum  Signal  Processing,  2nd  Edition,  McGrow-Hill  Book  Co.,  New  York, 
1988. 


-  12  - 


2.2.4  The  Cascade-Correlation  Neural  Network 


As  have  been  pointed  out  by  Falman  and  Lebiere  (1990),  main  problems  associated  with 
convergence  rate  of  backpropagation  networks  are  the  step-size  problem  and  the  moving  target 
problem.  The  former  refers  to  the  use  of  constant  step  size  and  the  latter  to  the  inability  of  hidden 
nodes  in  quickly  captuiing  regularity  from  only  input  signals  and  error  signals  without  taking  account 
of  the  lateral  interactions,  when  the  error  surface  changes  frequently.  Those  problems  are  manifested 
by  the  determination  of  network  architecture,  especially  the  determination  of  the  number  of  nodes  in 
the  hidden  layers,  which  usually  cannot  be  defined  a  priori.  The  Cascade-Correlation  dynamic  node 
generation  network  (Fahlman  and  Lebiere,  1990)  provides  some  rational  thinking  in  solving  the 
learning  problem. 

A  Cascade-Correlation  Network  starts  with  a  basic  network,  then  trains  and  adds  new  hidden 
units  one  by  one,  creating  a  multilayer  structure.  There  are  two  processes  involved  in  the  construction 
of  a  Cascade-Correlation  network.  The  first  deals  with  the  architecture  generation  and  the  second 
deals  with  the  learning  algorithm.  According  to  Fahlman’s  description,  hidden  nodes  are  added  to  the 
network  one  by  one.  Each  hidden  node  receives  a  connection  from  each  of  the  network’s  original  in¬ 
puts  and  also  from  every  pre-existing  hidden  units.  Once  a  unit  is  added  to  the  network,  its  weight  on 
the  input  side  is  frozen  and  only  the  weight  on  the  output  side  is  trained.  The  unit  creation  algorithm 
essentially  performs  following  operations:  firstly  it  starts  with  a  candidate  unit  (or  a  pool  of  candidate 
units)  that  receives  connections  from  the  network  external  inputs  and  from  previous  hidden  units,  with 


Inputs 


Add  Hidden  Unit  #  1 


Inputs 


Outputs 

The  network  is  initially 
trained  as  a  perceptron. 


Outputs 

The  connections  denoted  by 
solid  line  are  frozen  after 
correlational  training;  the 
remaining  connections  are 
trained  repeatedly. 


State  After  Adding  One  Hidden  Unit 


Figure  2.4  -  The  Evolution  of  Cascade-Correlation  Network 
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the  output  of  the  candidate  unit  being  disconnected  to  the  trained  network.  Then  the  newly  created 
candidate  network  is  trained  by  using  quickprop  (Fahlman,  1988)  on  the  training  sets  and  only  the 
candidate  unit’s  input  weights  are  adjusted  after  each  pass.  The  training  criterion  for  input  weights  is 
to  maximize  the  correlation  measurement  S  defined  as 

s  =  2i2<vP-vxEp,o-E)i. 

0  p 

which  is  the  sum  over  all  output  units  o  of  the  magnitude  of  the  correlation  between  V,  the  candidate 
unit’s  activation  value,  and  Eo,  the  residual  output  error  observed  at  unit  o.  When  S  stops  improving, 
the  new  candidate  is  installed  to  the  trained  network  and  its  input  weights  are  frozen  subsequently. 
This  process  continues  until  the  concept  in  the  training  sets  is  properly  captured  by  the  network.  The 
cascade  architecture  evolution  process  is  illustrated  in  Fig.  2.4. 

The  main  advantage  of  the  Cascade-Correlation  Architecture  over  other  existing  learning  algo¬ 
rithms  is  that  it  learns  very  quickly  and  it  systematically  determines  its  network  structure  during  the 
training  process.  The  power  of  this  learning  algorithm  has  been  illustrated  through  modeling  the  two 
spirals  problem  (Lang  and  Witbrock,  1988). 
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2.2.5  The  Self-Organizing  Neural  Network  (SONN) 

The  SONN  proposed  by  Tenorio  and  Lee  (1989)  is  a  supervised  learning  algorithm  for  architecture 
construction  and  refinement  in  feedforward  neural  networks.  The  learning  process  is  controlled  by  a 


modified  Minimum  Description  Length  criterion  (Rissanen,  1983)  which  is  also  used  as  an  optimality 
criterion  to  guide  the  construction  of  the  network  structure,  instead  of  the  simplistic  mean-square  er¬ 
ror.  The  search  for  the  correct  model  structure  or  network  architecture  is  accomplished  via  Simulated 
Annealing  (Kirkpatrick,  et  al.,  1983)  so  that  the  node  accepting  rule  varies  at  run-time  according  to  a 
cooling  temperature  schedule. 

It  should  be  pointed  out  that  SONN  is  proposed  to  solve  a  general  system  identification  problem 
such  that  its  structure  bears  some  close  relation  to  the  representation  of  nonlinear  systems.  The 
SONN  algorithm  can  be  charactered  by  three  components:  1)  a  generating  rule  of  the  primitive  neu¬ 
ron  transfer  functions,  2)  an  evaluation  method  to  assess  the  quality  of  the  model,  and  3)  a  structure 
search  strategy  via  Simulated  Annealing.  The  algorithm  can  be  conceptually  put  in  the  following  form: 

1)  Initialize  the  cooling  temperature  and  the  state  with  basic  nodes, 

2)  Repeat  the  following  procedures  until  the  magnitude  of  the  temperature  is  smaller  than  the 
terminal  temperature  for  simulated  annealing: 

2.1)  Repeat  the  following  computations  until  the  number  of  new  neurons  is  greater  than 
the  number  of  observations: 

2.1.1)  Use  the  neuron  generating  rule  to  produce  new  neurons  to  the  structure  and 
calculate  the  energy  corresponding  to  the  current  and  new  states; 

2.1.2)  If  the  energy  of  the  new  state  is  smaller  than  that  of  the  current  state,  then 
accept  the  neuron,  else  accept  the  neuron  with  a  probability. 

2.2)  Decrease  the  temperature  via  geometrical  annealing  sequence. 

The  performance  of  the  algorithm  was  illustrated  through  modeling  a  highly  chaotic  time  series  for 
system  iden  ication  and  short  time  prediction  and  through  comparison  with  standard  backpropaga- 
tion  networks.  Though  the  search  algorithm  with  Simulated  Annealing  is  a  non-deterministic  scheme, 
the  SONN  shows  remarkable  advantage  over  the  standard  backpropagation  learning  algorithm.  The 
SONN  requires  far  less  samples  to  acquire  an  estimation  of  the  system  and  the  structure  of  the  net¬ 
work  is  determined  at  run-time.  Because  of  the  use  of  Simulated  Annealing  in  the  learning  algorithm, 
it  is  less  susceptible  to  the  problem  with  local  minima  and  the  convergence  of  learning  does  not  depend 
on  conditions  of  the  initial  set  of  weight.  On  the  other  hand,  a  system  modeled  with  SONN  algorithm 
shows  better  performance  in  prediction.  However,  three  types  of  nodes  with  different  functionalities 
are  defined  in  this  network.  Needless  to  say,  this  algorithm  has  more  parameters  and  is  more  complex 
than  the  straightforward  backpropagation  network.  Certain  fine  tuning  routines  are  needed  for  a 
successful  implementation  of  the  algorithm. 
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2.2.6  The  GrowNet  Algorithm 

The  GrowNet  algorithm  is  proposed  by  Smith  (1990)  to  solve  the  topology  determination  problem 
associated  with  backpropagation  networks.  This  approach  bears  some  resemblance  to  the  SONN 
algorithm  described  in  the  previous  subsection  in  that  it  uses  a  heuristic  rule  to  choose  the  next  step  in 
growing  a  net  and  each  step  is  reversible;  whereas  the  SONN  algorithm  adopts  a  stochastic  search 
algorithm  to  generate  suitable  topology. 

The  GrowNet  algorithm  begins  with  a  simple  network  and  tries  to  minimize  the  simplistic  mean- 
square  error  using  gradient  descent  method.  At  each  epoch,  the  prospect  of  further  reduction  on  the 
error  is  checked.  If  the  prospect  is  not  desirable,  then  the  net  is  enlarged  by  a  growth  process,  other¬ 
wise  the  gradient  descent  continues.  There  are  two  components  in  the  GrowNet  process:  a  statistics 
gathering  epoch,  and  the  growing  of  a  node.  The  statistics  gathering  epoch  includes  the  computation 
of  statistics  on  the  correlation  between  the  error  of  each  node  and  the  activation  of  other  nodes,  the 
estimation  on  benefit  of  growth  at  each  node,  and  the  determination  of  the  node  that  will  offer  the 
greatest  benefit  after  growing.  Growing  a  node  involves  replacing  the  node  with  a  more  complex  node 
or  a  group  of  nodes.  Similar  to  the  SONN  algorithm,  three  types  of  nodes  with  different  activation 
functions  are  also  used  in  this  network.  The  pseudocode  as  given  in  the  report  is  as  follows: 

1)  Declarations: 
flag  gatherstats; 

2)  Initialization: 

Create  net  with  a  simple  topology; 

Unset  gatherstats; 

2)  REPEAT 

For  (each  exemplar)  Do 

Collect  derivatives  of  net  parameters; 
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If  (gatherstats  is  set)  Then 
Collect  correlation  statistics; 

End  (f; 

End  For; 

If  (task  solved)  Then 
Exit  REPEAT 

Elseif  (gatherstats  is  set)  Then 
Select  most  promising  node; 

Grow  most  promising  node; 

Imset  gatherstats; 

Elseif  (no  further  reduction  of  error  likely)  then 
set  gatherstats; 

Else 

Update  net  parameters; 

End  If; 

End  REPEAT. 

The  characteristics  of  different  parameters  in  the  tuning  of  this  algorithm  has  been  studied  and  its 
performance  on  the  Or-NxM  tasks  shows  its  promise  in  the  improvement  on  the  standard  gradient 
descent  method.  However,  this  algorithm  is  not  extensively  tested  so  that  its  power  and  shortcomings 
are  not  well  exposed  yet.  The  best  feature  of  this  algorithm  lies  in  its  run-time  determination  of  the 
network  configuration  or  architecture,  its  moderate  computational  cost,  and  its  compatibility  with  the 
standard  backpropagation  algorithm. 

It  is  interesting  to  notice  that  SONN  and  GrowNet  algorithms  are  seldom  used  by  other  research¬ 
ers  in  the  neural  network  community.  One  primary  reason  is  that  these  two  algorithms  utilize  three 
types  of  nodes  or  computational  units  to  build  a  network,  which  in  turn  introduces  more  complexity 
and  uncertainty  in  the  implementation  of  these  algorithms.  Nevertheless,  it  is  felt  that  these  two  algo¬ 
rithms  are  not  well  explored  and  fully  understood  yet.  There  is  still  much  work  to  be  done  in  this 
direction. 
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2.3  Higher  Order  Schemes  for  Feedforward  Networks 

As  has  been  stated  before,  the  Generalized  Delta  Rule  in  backpropagation  networks  performs  a 
gradient  descent  search  in  the  weight  space  for  the  minimization  of  a  mean-squared  error  function. 
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Therefore,  it  is  easy  to  postulate  that  all  the  minimization  schemes  are  applicable  to  the  learning  algo¬ 
rithm  derivation  for  multilayer  feedforward  networks.  From  numerical  analysis,  the  (steepest)  gradi¬ 
ent  descent  method  is  a  first  order  scheme  and  has  poor  numerical  property  in  terms  of  convergence 
rate  and  the  ability  to  handle  ill-condit;oning  of  the  system.  Higher  order  methods  such  as  Newton’s 
methods  and  Quasi-Newton  methods  by  including  information  on  the  second  order  derivatives,  that 
is,  the  Hessian  matrix  of  the  system,  have  far  better  numerical  properties  than  the  steepest  descent 
method.  However,  the  computation  expense  involved  with  the  determination  of  higher  order  informa¬ 
tion  is  very  expensive  and  it  also  requires  more  storage  space.  On  the  other  hand,  the  Conjugate  Gradi¬ 
ent  Method  or  Preconditioned  Conjugate  Gradient  Methods  would  provide  faster  learning  algorithms 
because  of  their  superlinear  rate  of  convergence  and  the  saving  in  storage  space  (Hageman  and  Young, 
1981;  Golub  and  VanLoan,  1983). 

Nevertheless,  before  a  higher  order  scheme  is  considered  as  a  legitimate  candidate  in  the  realm  of 
learning  algorithms,  it  should  be  derived  in  a  form  which  is  computationally  efficient  and  suitable  for 
local  implementation.  It  should  also  conserve  the  intrinsic  parallelism  of  operations  of  the  network.  In 
a  backpropagation  network,  the  formula  for  weight  update  is: 

Aw(t)  -  -e  dE/<3w(t)  +  aAw(t - 1) 

where  e  is  the  learning  rate  and  a  the  momentum  factor.  The  update  of  weights  proceeds  either  in 
batch  mode  or  in  on-line  mode.  The  former  refers  to  the  update  of  weights  only  after  all  the  training 
sets  have  been  presented  to  the  network,  and  the  later  refer  to  the  update  of  weights  after  presenting 
each  training  set.  For  second  and  higher  order  algorithms,  backpropagation  is  usually  implemented  in 
the  batch  mode.  It  thus  becomes  obvious  that  any  improvement  on  the  learning  algorithm  alone 
should  involve  the  adaptive  determination  of  the  two  learning  parameters  (e  and  oc),  whereas  these  two 
parameters  are  set  as  constants  in  the  standard  backpropagation  network.  Tb  date,  numerous 
schemes  have  been  proposed  to  improve  the  learning  mechanisms  in  backpropagation  networks  by 
incorporating  higher  order  information  of  the  system  or  using  heuristic  rules  to  guide  the  adaptation  of 
the  learning  parameters.  In  the  following  paragraphs,  some  of  the  new  schemes  are  sketchily  de¬ 
scribed. 
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2.3.1  Quickprop 

The  Quickprop  algorithm  was  proposed  by  Falhman  (1988)  to  improve  the  rate  of  convergence  of 
the  backpropagation  network  through  adaptive  calculation  of  the  momentum  factor  a.  It  is  a  second 
order  method  in  a  sense,  based  loosely  on  Newton’s  method,  but  it  is  more  heuristic  than  formal.  The 
information  required  is  the  gradient  of  the  previous  training  epoch  and  that  of  the  current,  along  with 
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the  difference  between  that  of  the  previous  and  current.  Therefore,  the  weight  update  formula  is: 

Aw(t)  =  -e  3E/3w(t)  +  — — 7 — — —  Aw(t- 1) 
w  '  3E/3w(t  - 1)  -  3E/3w(t)  v  ' 

According  to  Falhman,  the  Quickprop  algorithm  is  derived  based  on  two  crude  assumptions:  1) 
the  error  vs.  weight  curve  for  each  weight  can  be  approximated  by  a  parabola,  and  2)  the  change  in  the 
slope  of  the  error  curve,  as  seen  by  each  weight,  is  independent  of  the  other  weights  that  are  changing  at 
the  same  time.  Though  those  are  simple  assumptions,  the  resulting  algorithm  gives  substantial  im¬ 
provement  on  the  convergence  rate  over  the  standard  scheme  when  tested  on  the  Encode/Decode 
tasks.  The  speedup  over  standard  backpropagation  algorithm  is  about  one  order  of  magnitude  (10 
times)  on  training  a  small  set  of  benchmark  problems,  and  the  algorithm  seems  scaled-up  well. 
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2 3.2  Quasi-Newton  Style  Methods 

Direct  use  of  the  Newton  Method  to  neural  network  is  inappropriate  for  the  following  reasons.  At 
first,  Newton  method  gains  quadratic  or  nearly  quadratic  convergence  only  when  the  starting  point  in 
the  solution  space  is  within  a  convex  region  of  the  function  and  the  size  of  that  neighborhood  dimi¬ 
nishes  with  increasing  number  of  variables,  whereas  a  randomized  initial  weight  matrix  is  routinely 
used  in  training  a  neural  network.  Secondly,  the  computation  of  the  Hessian  matrix  and  its  inverse 
requires  expensive  computation  and  intensive  storage  (Dennis  and  More,  1977).  Though  a  Quasi- 
Newton  method  such  as  the  BFGS  (Broyden-Fletcher-Goldfab-Shanno)  algorithm  (Dennis  and 
More,  1977;  Dennis  and  Schnabel,  1983)  has  better  numerical  performance,  especially  on  the  storage 
usage,  than  the  Newton  method,  like  the  Newton  method  it  also  uses  global  information  for  the  updat¬ 
ing  of  weights  such  that  its  use  as  an  efficient  learning  rule  for  large  problems  in  practice  is  question¬ 
able.  In  this  subsection,  the  BFGS  method  is  briefly  described  to  show  the  flavor  of  Quasi-Newton 
methods.  Some  other  schemes  with  a  connection  to  the  use  of  information  on  second  order  derivatives 
are  thus  classified  as  Quasi-Newton  style  methods. 

2.3.2.1  BFGS  Method 

*. 

The  use  of  the  BFGS  method  as  a  learning  algorithm  for  feedforward  networks  has  been  investi¬ 
gated  by  Watrous  (1987).  In  nonlinear  minimization,  if  the  objective  function  or  the  error  function  is 
approximated  as  a  quadratic  function  through  Taylor’s  expansion,  then 

E(w  +  Aw)  ~  E(w)  +  gTAw  +  ^-AwtGAw 


where  g  is  the  gradient  vector  defined  as  g=  VE(w),  and  G  is  the  Hessian  matrix.  In  the  Newton 
Method,  the  minimum  can  be  directly  computed  by  solving  the  system  of  equations,  namely: 

Aw  =  G'1  g 

In  a  Quasi-Newton  method,  instead  of  calculating  the  Hessian  matrix  and  its  inverse  or  solving  a 
system  of  equations,  the  inverse  matrix  of  the  Hessian  is  approximated  iteratively  with  H.  The  basic 
quasi-Newton  algorithm  consists  of  the  following  steps  (Dennis  and  Schnabel,  1983): 

1)  Calculate  a  search  direction  s  =  -  H  g; 

2)  Perform  line  search  in  the  s  direction,  that  is,  minimize  E(w)  along  s; 

3)  Update  H  using  different  schemes  such  as  the  BFGS  algorithm. 

The  difference  among  Quasi-Newton  methods  lies  in  the  utilization  of  different  updating  schemes 
for  H.  Nevertheless,  the  BFGS  Hessian  update  is  symmetric  and  positive  definite,  making  the  algo¬ 
rithm  numerically  more  stable  than  other  schemes.  If  we  define  yi  =  gi  -  gi-i,  and  8,  =  wj  -  Wj_i,  then 
the  BFGS  update  is  of  the  following  form  (Dennis  and  More,  1977), 


Hi 


TT  I  n  I  yTH>\  <WT  (5yTH  +  HydT 

n-1  +  (1  +  ~6fp'6fY - - 


In  Watrous’  study,  the  performance  of  BFGS  algorithm  is  compared  with  that  of  backpropagation 
on  the  training  of  the  XOR  problem  and  a  small  multiplexor  problem.  It  was  reported  that  the  BFGS 
method  converged  in  significantly  fewer  iterations  and  had  a  better  error  tolerant  property.  However, 
each  BFGS  iteration  still  requires  0(n2)  operations  (Dennis  and  More,  1977),  compared  to  O(n)  for 
backpropagation.  On  the  other  hand,  because  the  method  has  not  been  extensively  tested  on  different 
problems,  the  robustness  of  the  BFGS  method  is  not  well  understood  yet. 


23.2.2  The  Pseudo-Newton  Algorithm 

The  pseudo-Newton  algorithm  was  proposed  by  Becker  and  le  Cun  (1988)  to  approximate  the  in¬ 
formation  of  the  second  order  derivatives  and  to  include  it  in  the  learning  algorithm.  The  algorithm 
only  calculates  the  diagonal  terms  of  the  Hessian  matrix  and  ignoring  the  off-diagonal  terms.  From 
intuition  on  numerical  analysis,  the  algorithm  would  work  very  well  for  diagonally  dominant  systems. 
By  using  the  absolute  value  of  the  diagonal  Hessian  terms,  the  pseudo-Newton  step  for  weight  updatp 
is  defined  in  the  following  form, 


Aw  =  3E(w)/3w 

la^wVaV  +  n\ 

Where  p  is  a  small  value  to  improve  the  conditioning  of  the  Hessian  when  in  regions  of  very  small 
curvature  such  as  at  inflection  points  and  plateaus. 

The  performance  of  this  algorithm  has  been  tested  on  the  encode/decode  problem  and  appears  to 
have  a  slightly  faster  convergence  rate.  However,  it  is  reported  that  if  the  initial  weights  are  set  to  be 
very  large  or  very  small  values,  the  algorithm  fails  to  converge.  Besides,  in  some  regions  where  the 
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gradient  is  very  steep  and  the  curvature  is  very  shallow,  the  algorithm  tends  to  compute  steps  that  are 
too  large.  The  problem  might  rest  on  the  lack  of  line  search  routine  with  the  use  of  second  order  infor¬ 
mation.  This  indicates  that  care  should  be  exercised  in  using  higher  order  methods  to  the  learning 
problem  because  the  system  itself  is  more  statistically  bound. 


2 .3.2.3  The  Optimal  Second  Order  Methods  (OSOM) 

Parker  (1987)  derived  his  optimal  second  order  methods  by  using  an  efficient  approximation  to  the 
Newton  Method  in  the  calculation  of  the  second  derivatives.  Instead  of  minimizing  the  absolute 
squared  error  of  the  system,  it  tries  to  minimize  the  average  squared  error  which  is  expressed  as  expo¬ 
nentially  weighted  average  with  time  constant  p.  If  the  total  squared  error  is  represented  as  eTe,  the 
the  average  squared  error  is  defined  as 


-I 


e-M‘-*l  dr 


where  e  is  the  natural  logarithms. 

The  optimality  condition  can  be  derived  in  two  steps.  Initially,  calculate  the  derivative  of  the  aver¬ 
age  squared  error  with  respect  to  the  weights  by  holding  t  temporally  constant  and  set  the  derivative  to 
be  zero.  Then,  define  the  optimality  criterion  in  terms  of  the  optimal  path  through  weight  space  that 
the  weights  should  follow  as  the  network  is  trained.  That  is,  to  reactivate  the  time  variable  and  calcu¬ 
late  the  derivative  of  thru  derived  in  the  first  step  with  respect  to  time,  and  let  it  be  zero.  The  resulted 
differential  equation  is  then  the  first  order  optimal  algorithm.  Through  applying  numerical  treatment 
to  the  first  order  optimal  algorithm,  the  second  order  algorithm  is  of  the  following  form 


d2t 


de 


„  r  3€t  ,  ,deT 
"  _2a  ^aw€  (3w  0wT 


2CT 


3ze 


dwdw1 


dw  . 
"at~ 


It  is  easy  to  realize  that  this  algorithm  is  not  simple  and  the  implementation  scheme  should  be 
carefully  derived.  To  date,  the  implementation  scheme  and  its  performance  on  numerical  simulation 
has  not  been  reported  though  the  author  claimed  that  it  would  be  published.  Even  with  this  shortcom¬ 
ing,  it  is  still  worthwhile  and  insightful  to  look  at  the  unique  scheme  in  deriving  the  optimality  criterion. 
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2 3 3  The  Delta-Bar-Delta  Algorithm  (DBD) 

It  has  been  observed  through  previous  description  on  higher  order  schemes  that  incorporating 
information  on  second  order  derivatives  does  not  necessarily  guarantee  an  improvement  o/er  the 
backpropagation  learning  algorithm.  Experience  shows  that  sometimes  insightful  heuristics  would  be 
beneficial  if  they  are  properly  appraised.  Jacobs  (1988)  proposed  the  Delta-Bar-Delta  algorithm  for 
adapting  the  learning  rate  with  consideration  to  local  gradient  information.  Inspired  by  Kesten’s  work 
(1958)  on  the  steepest  descent  method  that  the  weight  value  is  oscillating  if  consecutive  changes  of  a 
weight  possess  opposite  signs,  Saridis  (1970)  uses  this  observation  to  increase  and  decrease  the  learn¬ 
ing  rate  in  the  following  way:  increasing  the  learning  rate  if  consecutive  derivatives  of  a  weight  possess 
the  same  sign,  and  decreasing  the  learning  rate  otherwise. 

The  Delta-Bar-Delta  algorithm  is  derived  based  on  the  following  heuristic  rules:  1)  Each  weight 
has  its  own  learning  rate,  2)  Every  learning  rate  should  be  allowed  to  vary  over  time,  3)  Increase  the 
learning  rate  for  a  parameter  if  the  derivative  of  the  parameter  possesses  the  same  sign  for  several 
consecutive  time  steps,  and  4)  Decrease  the  learning  rate  for  a  parameter  if  the  derivative  of  the  pa¬ 
rameter  flips  signs  for  several  consecutive  time  steps.  Based  on  these  heuristic  rules,  the  scheme  for 
modifying  the  learning  rate  is  defined  as  follows: 

{k  if  g(t  -  l)g(t)  >  0 

-  0e(t)  if  g(t  -  l)g(t)  <  0 

0  otherwise 

where  g(t)  =  V  E(w(t))and  g(t)  =  (1-0)  g(t)  +  0  g(t-l).  The  parameters  of  K,<j>  and  0  are  specified  by 
the  user. 

The  performance  of  the  algorithm  has  been  studied  in  training  of  the  Quadratic  surfaces  task,  the 
XOR  problem,  the  Multiplexer  problem  and  the  Binary-to-Local  problem  and  the  speed-up  ranges 
from  2  times  to  2  orders  of  magnitude  depending  on  the  nature  of  the  problem.  The  problem  with  this 
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algorithm  is  that  the  learning  rate  sometimes  goes  wild  even  with  a  small  K,  and  the  tuning  of  <j>  is  hard 
and  sometimes  contradictory. 

2 3 .3.1  The  Extended  Delta-Bar-Delta  Algorithm  (EDBD) 

The  EDBD  algorithm  is  designed  by  Minai  and  Williams  (1990)  to  improve  the  performance  of  the 
DBD  algorithm  with  following  modifications:  1)  The  learning  rate  increase  is  made  an  exponentially 
decreasing  function  of  |  g(t)|  instead  of  constant  k;  2)  Including  the  momentum  part  in  the  learning 
algorithm  and  letting  the  momentum  vary  with  time;  3)  A  ceiling  is  defined  for  both  learning  rate 
and  momentum  parameter;  and  4)  Memory  and  recovery  are  incorporated  into  the  algorithm.  After 
considering  these  modifications,  the  EDBD  algorithm  is  of  the  following  form: 


Awjj(t)  =  -j/jj(t)  3E/3Wij(t)  +  /*ijAwjj(t  - 1) 

J/ij(t  +  1)  =  Minima*  »/ij(t)  +  Aj?ij(t)] 

Mift  +  1)  =  Min[/<max,  ^j(t)  +  A//jj(t)] 

{a:i  exp(-  yi  1 5]J(t)  | ) 

-#7ij(0 

° 

{/Cm  exp(-  Ym  1 5ij(t)  | )  if  <5ij(t  -  l>5ij(t)  >  0 

-  0mVij(t)  if3£(t-l>5jj(t)  <  0 

q  otherwise 

where  8jj(t)  =  V  E(wjj(t))  and  <5jj(t)  =  (1-0)  8jj(t)  +  0<5jj(t-l).  The  parameters  of  Kj,  4m,  rimax,  y\ and 
<<m>  4>m>  M-max.  9.  7m  are  specified  by  the  user. 

The  performance  of  the  EDBD  algorithm  is  studied  on  the  simulation  of  the  XOR  problem  and  the 
quadratic  function  problem  and  it  shows  that  the  EDBD  algorithm  has  converged  in  all  the  cases. 
Besides,  it  is  a  more  robust  scheme  than  the  DBD  algorithm. 
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2.3.4  Adaptive  Stepsize  On-line  Backpropagation 


Backpropagation  can  be  implemented  in  batch  mode  or  in  on-line  mode  during  training  on  the 
data  set.  For  second  and  higher  order  algorithms,  only  batch  mode  backpropagation  is  realized  be¬ 
cause  of  the  nature  of  higher  order  methods.  Using  heuristics,  Chen  and  Mars  (1990)  proposed  an 
on-line  mode  backpropagation  learning  with  stepsize  adaptation.  The  algorithm  is  as  follows: 

aw  =  a(t-i)  ) 

f(t)  =  uif(t-l)  +  ujAE(t) 

AE(t)  =  E(t)-E(t-1) 

where  a(t)  is  the  stepsize  for  the  gradient  term  in  the  weight  update  formula,  AE(t)  is  the  decrement  of 
E(t)  and  f(t)  is  a  filtered  version  of  AE(t).  It  can  easily  be  seen  that  the  equation  for  f(t)  represents  a 
first  order  low-pass  recursive  filter.  The  parameters  ui  and  U2  are  used  to  control  the  adaptation.  For 
large  ui  and  small  U2,  the  adaptation  is  slow  but  more  stable,  otherwise  the  adaptation  is  fast  but  may 
lead  to  oscillation.  For  the  simulation  problems,  Ui=  0.9  and  U2=  0.3  have  been  used  with  success. 

It  has  pointed  out  that  this  algorithm  would  work  much  better  on  complex  problems  than  on  sim¬ 
ple  problems  because  the  adaptation  process  needs  certain  time  to  settle  to  be  fully  effective.  The 
disadvantage  of  this  algorithm  is  that  it  is  not  effective  on  the  flat  region  because  the  algorithm  makes 
the  weights  on  the  hidden  units  prematurely  saturated.  Tb  overcome  this  problem,  it  is  suggested  that 
differential  step  size  should  be  used  such  that  the  step  size  for  weight  updating  between  hidden  and 
output  layer  is  larger  than  that  between  the  input  and  hidden  layer,  and  usually  the  latter  is  about 
0. 1-0.5  of  the  former.  Though  performance  of  the  algorithm  is  far  better  than  the  standard  back- 
propagation  algorithm,  it  is  a  bit  slower  than  Quickprop  proposed  by  Falhman  (1988). 
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2.3.5  Minkowski-r  Backpropagation 


It  is  almost  routine  that  most  connectionist  learning  models  are  implemented  using  a  gradient 
descent  in  a  least  squares  error  function,  that  is,  the  error  signals  are  Euclidian.  People  may  then  ask: 
how  about  deriving  learning  models  based  on  non-Euclidian  error  measurement?  Hanson  and  Burr 
(1988)  answered  this  question  with  an  elegant  study  on  the  backpropagation  learning  using  Minkows- 
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ki-r  power  metrics  as  the  error  measurement.  The  derivation  of  the  algorithm  is  similar  to  that  of 
backpropagation  by  Rumelhart,  Hinton  and  Williams  (1986). 

Using  Minkowski-r  power  metrics,  the  error  can  be  represented  in  the  following  general  form: 

e  =  jZ  Kyi-yil  r 

r  i 

Then  the  gradient  in  the  general  Minkiwski-r  case  is 

=  (lyi-yiir1  yi(i-yDyi»  sgn(yj-yi) 

dWhi 


and  the  weight  update  formula  is  thus 

Awhi(n  +  1)  =  -€  3E/3whi  +  aAwhi(n) 

The  weight  updating  for  the  hidden  layer  proceeds  in  the  same  way  as  in  the  Euclidian  case  by  simply 
substituting  the  Minkowski-r  gradient. 

Through  numerical  analysis,  the  behavior  of  learning  model  changes  with  the  variation  of  the  pa¬ 
rameter  r  because  changing  the  value  of  r  basically  results  in  a  reweighting  of  errors  from  output  bits. 
In  one  respect,  varying  the  value  of  r  may  be  useful  for  various  aspects  of  representing  information  in 
the  feature  domain.  For  example,  if  the  distribution  of  feature  vectors  is  non-Gaussian,  then  the  r = 2 
case,  that  is,  the  Euclidian  error  case,  will  not  be  a  maximum  likelihood  estimator  of  the  weights.  In 
fact,  r = 1  would  be  right  for  modeling  Laplacian  type  distribution.  In  general,  when  r  <  2,  it  is  recom¬ 
mended  that  r = 1.5  may  be  optimal  for  many  noise  reduction  problems;  when  r  >  2,  it  tends  to  weight 
large  deviations  such  that  simpler  generalization  surfaces  may  be  created.  However,  it  is  observed  that 
the  convergence  time  tends  to  grow  almost  linearly  with  the  increase  of  r.  On  the  other  hand,  the  imple¬ 
mentation  of  the  learning  algorithm  is  more  complex  as  the  Minkowski-r  gradient  is  nonlinear. 

This  approach  is  unique  in  that  it  looks  at  the  same  problem  from  a  different  perspective.  Of 
course,  further  study  is  needed  to  explore  the  research  and  application  potential  in  this  direction. 
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2.4  Other  Approaches 

In  addition  to  the  samples  of  algorithms  described  in  previous  paragraphs,  there  are  a  lot  of  other 
algorithms  which  improve  the  learning  performance  of  standard  backpropagation  network  in  one  way 
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or  the  other.  The  architectural  determination  of  backpropagation  network  at  run-time  has  also  ex¬ 
plored  by  other  researchers  (Bailey,  1990;  Jockusch,  1990;  Nevard,  1990).  The  learning  models  are  also 
generalized  to  the  complex  plane  to  handle  a  special  kind  of  problem  (Clarke,  1990,  Kim  and  Guest, 
1990).  New  schemes  have  been  proposed  by  incorporating  stochastic  training  techniques  (Day  and 
Camporese,  1990;  Kolen,  1988),  using  extrapolatory  methods  (Dewan  and  Sontag,  1990),  including 
fuzzy  theory  (Fu,  1990;  Oden,  1988),  and  other  algebraic  and  numerical  techniques.  Tb  illustrate  the 
rich  and  fruitful  research  in  this  area,  a  selected  reference  listing  is  provided  in  this  section.  For  details 
of  each  algorithm,  the  original  article  should  be  consulted  rather  than  the  short  description  provided  in 
this  section. 
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2.5  Theoretical  Analysis  of  Supervised  Learning  Models 

The  previously  describe  d  approaches  to  the  architectural  determination  of,  and  learning  perform¬ 
ance  improvement  on,  the  supervised  learning  models  can  be  regarded  as  engineering  approaches.  On 
the  other  hand,  as  a  learning  paradigm,  neural  networks  have  their  intrinsic  properties,  such  as  their 
dynamics,  modeling  capability,  internal  feature  representation,  and  generalization  characteristics. 
The  purpose  of  these  kinds  of  theoretical  studies  is  to  understand  the  underlying  properties  and  rules 
that  govern  the  behavior,  operation  and  reasoning  of  different  neural  network  learning  models  such 
that  their  application  to  real  world  problems  would  be  well  guided.  A  good  example  is  the  rigorous 
mathematical  analysis  of  Rosenblatt’s  Perceptron  by  Minsky  and  Papert  (1969)  to  expose  the  exact 
limitations  of  a  class  of  computing  machines  that  could  seriously  be  considered  as  models  of  the  brain. 
Mathematical  analysis,  though  it  has  its  limitations,  in  most  of  the  cases,  shows  the  elegance  of  logic  as 
well  as  sober  and  rational  thinking. 
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In  a  general  sense,  from  a  numerical  analysis  perspective,  there  are  two  salient  features  that  govern 
the  operation  of  neural  networks:  the  stability  associated  with  feedback  recall  and  the  convergence 
with  supervised  learning  models.  Global  stability  refers  to  the  stabilization  of  the  activation  patterns 
of  a  network  from  any  input  pattern,  and  convergence  refers  to  the  ability  to  reduce  error  measurement 
of  a  system  in  a  large  enough  time.  Convergence  can  be  represented  in  an  absolute  sense  and  in  the 
mean  square  sense  (Papoulis,  1965).  All  the  theorems  concerning  neural  network  learning  are  based 
on  the  definition  of  global  stability  proposed  by  Lyapunov,  which  states  that  for  all  possible  system 
inputs  X  to  a  dynamic  system,  if  X  is  zero  only  at  the  origin,  having  the  first  derivative  defined  in  a  given 
domain  and  being  upper-bounded,  then  the  system  that  is  defined  by  a  Lyapunov  energy  function  of 
the  variables  of  X  that  maps  n  dimensions  to  one  will  converge  and  become  globally  stable  for  all  the 
inputs  X  (Chetayav,  1961).  There  are  three  stability  theorems  for  nonadaptive  autoassociator  (Cohen 
and  Grossberg,  1983),  adaptive  autoassociator,  and  adaptive  heteroassociator  (Kosko,  1988)  respec¬ 
tively  (Simpson,  1990). 

For  multilayer  feedforward  networks,  their  general  mapping  ability  has  been  proved  by  several 
researchers  and  has  generated  more  enthusiasm  and  confidence  since  then.  On  the  functional  model¬ 
ing,  Hecht-Nielsen  (1987)  uses  Kolmogorov’s  superposition  theorem  to  generally  support  the  model¬ 
ing  capability  of  a  multilayer  feedforward  network;  Gallant  and  White  (1988)  shows  that  a  three  layer 
network,  with  one  hidden  layer,  is  capable  of  embedding  a  Fourier  analyzer  by  using  the  monotone 
cosine  squasher;  and  recently,  Hornik,  et  al.  (1989)  proved  that  multilayer  feedforward  networks  are 
universal  approximators.  Issues  like  the  complexity  of  loading  shallow  neural  networks,  estimation  of 
neurons  in  the  hidden  layer,  scaling  property,  and  the  mathematical  theory  of  generalization,  have  also 
been  proposed  and  extensively  studied  (Judd,  1990;  Wolpert,  1990).  Due  to  time  limitation,  the  details 
of  different  approaches  to  the  theoretical  analysis,  and  that  of  different  new  learning  theories  are  not 
summarized  here.  The  reference  listing  following  will  provide  a  fairly  good  picture  of  the  current  state 
in  this  area. 
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2.6  Recurrent  Networks 


Recurrent  networks  have  emerged  from  the  need  to  internally  model  time  factors,  time  varying 
behaviors,  and  sequential  events.  The  learning  model  gets  its  name  due  to  the  very  existence  of  recur¬ 
rent  or  feedback  links  in  the  network.  Recurrent  networks  usually  assume  a  multilayer  architecture  or 
without  clear  distinction  from  unlayered  form  in  the  network  with  fully  recurrent  links.  Sometimes, 
recurrent  networks  are  referred  to  as  recurrent  backpropagation  networks  because  of  the  close  rela¬ 
tion  on  the  backpropagation  learning  algorithm  (Pineda,  1987, 1988). 

The  representation  of  sequential  events  can  be  done  with  a  time  windowing  scheme  (Sejnowski  and 
Rosenberg,  1986)  or  using  a  crude  version  of  backpropagation  through  time  (Rumelhart,  et  al.,  1986)  in 
which  the  recurrent  network  is  unfolded  into  a  multilayer  feedforward  network  that  grows  by  one  layer 
on  each  time  step.  Backpropagation  through  time  only  works  well  when  the  time  structure  of  the  prob¬ 
lem  is  known  a  priori.  For  the  time  windowing  scheme,  as  has  been  pointed  out  by  Elman  (1988), 
following  are  some  drawbacks:  1)  Some  interface  mechanisms  are  needed  to  buffer  the  input,  2)  The 
approach  does  not  easily  distinguish  relative  temporal  position  from  an  absolute  temporal  position, 
and  3)  The  shift  register  imposes  a  rigid  limit  on  the  duration  of  patterns  and  the  length  of  the  input 
vector  is  fixed.  All  in  all,  the  time  is  represented  explicitly  and  the  sequentiality  is  enforced  onto  the 
network  rather  than  internally  constructed.  Tb  overcome  those  shortcomings,  schemes  are  proposed 
such  that  the  representation  of  time  is  accomplished  by  the  effect  that  it  has  on  processing.  To  date, 
there  are  basically  three  kinds  of  recurrent  networks  proposed:  Jordan’s  Network  (Jordan,  1986);  El¬ 
man’s  Network  (1988);  and  Fully  Recurrent  Networks  (Williams  and  Zipser,  1989).  The  application  of 
recurrent  networks  has  covered  an  extensive  domain  including  language  processing  (Behme,  1990; 
Giles,  et  al.,  1990;  Grajski,  et  al.,  1990;  Liu,  et.  al.,  1990;  Stolche,  1990),  processing  of  time  dependent 
parameters  (Blumenfeld,  1990),  pattern  recognition  and  statistical  classification  (Wong  and  Vieth, 
1990),  learning  stochastic  sequence  (McCuloch,  1990),  vision  (Qian  and  Sejnowski,  1989),  and  solving 
constraint  problems  (Schaller,  1990). 

In  the  following  paragraphs,  the  architecture  and  operation  of  the  three  basic  recurrent  networks 
are  described  and  some  references  on  the  theoretical  analysis  of  recurrent  networks  and  their  applica¬ 
tions  are  also  listed. 


2.6.1  Jordan’s  Network 

A  Jordan  network  is  a  layered  feedforward  network  with  recurrent  connections  from  the  output 
layer  to  a  section  of  the  input  layer.  The  recurrent  connections  copy  the  output  at  previous  time  to  the 
input  of  the  current  time  so  that  the  hidden  units  see  its  own  previous  output  and  this  knowledge  then 
influences  the  subsequent  behavior  if  only  one  hidden  layer  is  used  in  the  network.  The  recurrent 
connections  are  not  trainable  so  that  the  recurrent  network  can  be  directly  trained  with  the  standard 
backpropagation  learning  algorithm.  However,  the  presence  of  nontrainable  recurrent  connections 
limits  the  richness  of  time  sequence  representation.  The  architecture  of  a  Jordan  Network  is  shown  in 
Fig.  2.5. 
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2.6.2  Elman’s  Network 

An  Elman  network  is  a  modification  of  the  Jordan  network  by  introducing  a  set  of  context  units  in 
the  input  layer  and  making  the  recurrent  connections  from  hidden  layer  to  the  context  units  in  the 
input  layer.  The  architecture  of  an  Elman  network  is  shown  in  Fig.  2.6.  Like  that  in  Jordan’s  network, 
the  recurrent  links  are  not  trainable  and  the  activations  of  context  units  at  current  time  are  merely  a 
copy  of  those  of  the  hidden  units  at  previous  time  if  a  three  layer  feedforward  network  with  one  hidden 
layer  is  used.  The  training  of  the  network  is  accomplished  via  backpropagation  learning  algorithm  and 
the  initial  activations  of  the  context  units  are  set  at  0.5  when  the  activation  function  is  bounded  in  0.0  to 
1.0. 


Though  Eleman’s  modification  on  the  Jordan  network  appears  minor  at  first,  it  has  infused  a  new 
representation  scheme  into  the  network.  Because  the  features  of  the  input-output  are  represented  in 
the  hidden  units,  the  context  units  actually  supply  the  network  with  a  state  identification  for  its  pre¬ 
vious  state.  The  network  still  has  the  shortcoming  of  a  Jordan  network  and  the  feature  captured  in  the 
context  units  may  not  be  crisp  enough  to  give  well  defined  state  identification  if  the  training  data  sets 
are  noise  contaminated. 


2.6.3  Fully  Recurrent  Networks 

In  a  fully  recurrent  network,  the  concept  of  layering  is  lost  because  each  neuron  is  connected  to 
every  other  neuron  in  the  network  and  each  one  functions  like  both  input  and  output  units.  For  this 
kind  of  network,  a  gradient  following  learning  procedure  called  real  time  recurrent  learning  (RTRL) 
has  been  proposed  to  suit  the  architectural  complexity  of  the  network  (William  and  Zipser,  1989). 
With  the  use  of  RTRL,  the  network  runs  continually  in  the  sense  that  they  sample  their  inputs  on  every 
update  cycle,  and  any  unit  can  receive  training  signals  on  any  cycle.  In  addition,  it  can  solve  problems 
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Figure  2.6  -  Architecture  of  an  Elman  Recurrent  Network 


requiring  an  input  of  arbitrary  length  and  it  correctly  does  credit  assignment  to  past  events.  The  major 
drawbacks  of  using  this  kind  of  network  via  RTRL  are  that  it  is  computationally  very  expensive  to  train 
the  network  and  the  learning  procedure  for  RTRL  is  a  non-local  method. 

2.6.4  Others’  Work 

There  are,  of  course,  many  improvements  and  new  schemes  based  on  Jordan  and  Elman’s  work. 
Mozer  (1988)  made  some  improvement  on  the  performance  of  the  network  by  adding  a  layer  of  units 
that  each  gave  a  single  self-recurrent  connection  that  is  trained  by  a  true  gradient-following  learning 
rule.  Pearlmutter  (1988)  proposed  a  scheme  to  improve  the  performance  of  backpropagation  through 
time  and  Almeida  (1987),  Pineda  (1988),  as  well  as  Rohwer  &  Forrest  (1987)  all  derived  various  ver¬ 
sions  of  the  recurrent  network  in  which  the  network’s  actual  and  desired  dynamics  settling  to  a  fixed 
equilibrium  state  on  each  training  cycle.  Recently,  a  higher  order  recurrent  network  has  been  pro¬ 
posed  by  Giles,  et  al.  (1990). 

It  can  easily  be  seen  that  recurrent  networks  will  have  strong  potential  in  modeling  the  time  varying 
behaviors  in  engineering.  In  material  modeling,  recurrent  networks  have  been  proposed  to  model  the 
mechanical  behavior  of  engineering  materials  under  cyclic  loading  (Wu,  1990).  The  challenge  is  on  the 
identification  of  suitable  problems  and  also  the  development  of  new  and  efficient  learning  algorithms 
for  this  kind  of  network.  The  reference  listing  provides  information  on  both  theoretical  analysis  and 
applications. 
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2.7  Unsupervised  Learning  Systems 


Unsupervised  learning  or  self-organizing  systems  are  primarily  inspired  by  our  understanding  on 
human  information  process  from  extensive  research  in  the  psychological  and  biological  processes  of 
cognition.  In  unsupervised  learning,  the  network  is  not  taught  the  regularity  and  relations  in  the  set  of 
training  patterns,  instead  the  network  captures  the  regularities  of  the  input  vectors  by  using  unsuper¬ 
vised  learning  procedures.  This  phenomenon  is  remarkable  because  it  provides  a  computational  mod¬ 
el  with  natural  resemblance  to  the  cognitive  process  of  human  beings.  The  capability  of  self-organiza¬ 
tion  also  makes  this  kind  of  network  a  powerful  tool  for  real-time  pattern  classification  and  signal 
processing  applications  where  the  target  classifications  are  not  known  a  priori  yet  the  data  can  be 
sorted  into  different  categories.  On  the  other  hand,  from  stability  theory  concerning  the  computation¬ 
al  properties  of  self-organizing  networks,  the  stabilization  of  the  system  has  a  definite  relation  with  the 
minimization  of  an  energy  measurement  of  the  system  so  that  self-organizing  systems  have  great  po¬ 
tential  in  solving  constraint  optimization  problems.  To  date,  Hopfield’s  network  (Hopfield,  1982)  and 
Kohonen’s  self-organizing  network  (Kohonen,  1984)  have  been  successfully  used  for  some  combinato¬ 
rial  optimization  problems  in  different  fields,  including  the  traveling  salesman  problem  (TSP). 

In  unsupervised  learning,  self-organization  is  imparted  via  learning  rules  based  on  Hebbian  learn¬ 
ing  (Hebb,  1949)  and  Competitive  learning  (Grossberg,  1976;  Rumelhart  and  Zipser,  1986)  or  variants 
of  both.  In  Hebbian  learning,  the  modification  of  weights  is  based  on  the  correlation  between  the 
presynaptic  and  postsynaptic  activity  of  a  neuron.  The  weight  of  connection  is  increased  if  the  correla¬ 
tion  is  positive  (excitation),  otherwise  the  weight  of  connection  is  decreased.  Competitive  learning  is  a 
pattern  classification  procedure  for  conditioning  intra-layer  connections  in  a  two  layer  network  such 
that  the  input  vectors  are  properly  classified  into  distinct  clusters.  Competition  and  inhibition  are  two 
basic  mechanisms  that  provide  dynamics  to  the  system.  For  example,  competitive  layers  and  inhibito¬ 
ry  connections  are  salient  features  of  the  Kohonen  network,  the  Counterpropagation  network  (Hecht- 
Nielsen,  1987)  and  Adaptive  Resonance  Theory  (Grossberg,  1976;  Carpenter  and  Grossberg,  1986). 

In  this  section,  a  brief  overview  is  provided  on  some  widely  used  unsupervised  learning  models  and 
some  recent  developments  in  the  area.  Note  that  the  reference  listing  does  not  necessarily  include  the 
early  work  on  the  analysis  of  different  learning  paradigms  and  such  information  can  be  found  in  nu¬ 
merous  books  recently  published  on  neural  networks. 


2.7.1  Hebbian  Learning  Rule 


In  self-organizing  systems,  most  of  the  learning  rules  for  modifying  the  connection  strengths  of 
existing  connections  are  a  variant  of  Hebbian  learning.  Hebbian  learning  is  a  correlation  rule  based  on 
observations  from  physiological  and  psychological  studies  on  cognition.  In  his  book,  Organization  of 
Behavior  (1949),  Hebb  states  that: 

When  an  axon  of  cell  A  is  near  enough  to  excite  a  cell  B  and  repeatedly  or  persistently  takes 
part  in  firing  it,  some  growth  process  or  metabolic  change  takes  place  in  one  or  both  cells  such 
that  A' s  efficiency  as  one  of  the  cells  firing  B  is  increased. 
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In  simple  terms,  we  can  state  Hebb’s  rule  as:  if  a  unit  receives  an  input  from  another  unit  and  both 
are  active,  the  connection  weight  between  these  two  units  should  be  increased.  This  simple  learning 
rule  is  usually  put  in  the  following  mathematical  form: 

Awjj  =  rj  aj  aj 

where  T]  is  the  learning  rate,  a,  and  aj  are  activations  of  both  units,  and  Awjj  is  the  modification  on  the 
connection  weight. 

One  disadvantage  of  using  this  simple  correlation  learning  rule  is  that  it  is  not  goal  bounded  like 
that  of  a  delta  rule.  Different  versions  of  the  Hebbian  style  learning  rule  have  been  proposed  by  many 
researchers.  Sejnowski  (1977)  uses  the  covariance  correlation  to  replace  the  simple  correlation  such 
that 


Awjj  =  rj  (ai-ai)  (aj-aj) 

where  a [  is  the  mean  value.  Sutton  and  Barto  (1981)  use  the  correlation  of  the  mean  value  and  the 
variance  in  the  learning  rule  in  the  following  form: 

Aw^  =  t)  (aj)  (aj-aj) 

Klopf  (1986)  introduced  the  drive-reinforcement  learning  by  using  the  correlation  in  the  changes  of 
activation  such  that 


Aw^  =  rj  Aaj  Aaj 

Of  course,  combinations  of  the  above  schemes  also  produce  some  new  schemes,  such  as  the  one  pro¬ 
posed  by  Cheung  and  Omidvar  (1988): 

Awy  =  rj  aj  wy  Aaj 

A  major  improvement  on  Hebbian  learning  is  the  introduction  of  decaying  effect,  which  has  been  illus¬ 
trated  by  Grossberg  (1968),  Hopfield  (1984)  and  others.  The  learning  rule  is: 

Awjj  =  -wy  +  F(aj)  F(aj) 

in  which  F  is  the  activation  function.  Based  on  this,  Kosko  (1986)  proposed  his  differential  Hebbian 
learning  in  the  following  way: 


Awjj  =  -Wy  +  F'(ai)  F'(aj) 
where  F'  is  the  derivative  of  F  with  respect  to  the  activation  value. 

2.7.2  The  Competitive  Learning  Architecture 

The  simplest  form  of  competitive  learning  networks  consists  of  two  layers;  the  input  layer  for  re¬ 
ceiving  input  patterns  and  the  competitive  layer  for  classifying  the  input  vectors  (Rumelhart  and  Zips- 
er,  1986) .  The  weights  are  usually  limited  in  the  neighborhood  of  (0, 1),  and  the  sum  of  weights  to  a  unit 
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is  always  1.  The  competition  in  the  competitive  layer  is  accomplished  through  a  winner-takes-all 
scheme,  in  which  the  unit  with  the  highest  sum  of  weights  is  assigned  as  the  winner.  The  activation  of 
the  winner  is  then  set  at  1.0  and  the  remaining  units  are  given  the  value  of  0.0.  The  winner-takes-all 
scheme  can  be  represented  in  the  following  form: 

{1  if  Nj  ( =  ajWjj)  >  Ni,  for  all  i,  i  *  j 
0  otherwise 

where  aj  is  the  activation  value  of  unit  i.  The  weight  updating  is  made  after  the  winner  is  selected,  and 
only  the  weights  that  correspond  to  the  connections  to  the  winner  are  updated  in  the  following  form: 

Awy  =  r\  (~  -  Wjj) 


in  which,  T]  is  the  learning  rate,  and  n  is  the  number  of  units  in  the  input  layer  that  have  activation  levels 
of  1.0.  The  simplistic  architecture  of  the  network  is  shown  in  Fig.  2.7. 


Figure  2.7  -  A  Simple  Architecture  of  a  Competitive  Learning  Network 


It  should  be  pointed  out  that  the  competitive  layer  can  also  be  implemented  with  inhibition  con¬ 
nections  instead  of  the  simple  winner-takes-all  scheme.  With  full  or  lateral  inhibition  connections,  the 
activation  levels  of  the  processing  units  gradually  relax  to  the  point  where  the  unit  with  the  highest 
incoming  sum  remains  activated  so  that  it  is  chosen  as  the  winner.  Such  a  system  would  be  more 
biologically  plausible. 


2.7.3  The  Hopfield  Network 

Hopfield  introduced  the  binary  version  of  the  network  in  1982  and  later  extended  it  to  treat  analog 
values  in  1984.  The  basic  structure  and  operation  of  the  two  versions  of  the  Hopfield  network  is  essen¬ 
tially  the  same.  For  simplicity,  the  binary  version  is  described  here. 
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The  Hopfield  network  is  a  single  layer  network  in  which  each  unit  is  connected  to  every  other  unit. 
The  network  is  recursive  because  the  output  of  each  unit  feed  into  inputs  of  other  units  in  the  same 
layer.  The  weight  matrix  is  symmetric  such  that  the  network  is  able  to  converge  to  a  stable  state.  Each 
unit  in  a  Hopfield  network  has  a  binary  activation  value  or  state,  that  is,  one  of  the  two  binary  states. 
The  state  of  the  network  at  a  moment  is  represented  by  a  state  vector  containing  the  activation  value  of 
each  unit.  The  state  of  the  network  can  be  changed  over  time  until  finally  it  settles  to  a  stable  state;  at 
this  moment,  the  corresponding  energy  measurement  of  the  network  reaches  its  optimum  value  wliich 
is  usually  a  local  minimum  of  the  energy  function.  To  find  the  global  minimum  of  the  energy  function 
that  the  system  represents  or  corresponds  to,  a  restarting  scheme  or  the  use  of  Boltzmann  and  Cauchy 
machines  is  needed. 

There  are  two  processes  involved  with  the  self-organizing  process  of  a  Hopfield  network,  namely, 
the  setting  of  connection  weights  and  the  state  vector  updating.  The  connection  strengths  or  weights 
are  usually  wired  instead  of  through  training.  After  the  state  vector  is  initialized,  the  updating  of  state 
vector  proceeds  in  a  very  simple  procedure.  For  each  neuron,  calculate  the  weighted  sum  of  its  inputs. 
If  the  sum  is  larger  than  or  equal  to  zero,  then  change  the  activation  of  the  unit  to  1.0;  otherwise  set  the 
activation  value  to  0.  Selection  of  the  next  unit  for  updating  can  be  done  sequentially  or  randomly. 
This  process  continues  through  all  the  units  in  the  network  until  a  stable  state  of  the  network  is 
reached.  The  architecture  of  a  Hopfield  network  is  shown  in  Fig.  2.8. 


2.7.4  The  Kohonen  Self-Organizing  Network 

The  Kohonen  self-organizing  network  is  specially  designed  by  Kohonen  (1984)  for  regularity  deter¬ 
mination  and  feature  extraction  in  the  input  patterns.  Like  the  competitive  learning  network,  the  Ko¬ 
honen  network  usually  comes  in  two  layers,  an  input  layer  for  receiving  input  patterns  and  a  competi¬ 
tive  layer  processing  the  input  information.  Input  patterns  are  classified  by  the  units  that  they  activate 
in  the  competitive  layer  and  the  activation  patterns  of  the  competitive  layer  represent  the  identification 
of  the  network.  The  competitive  layer  is  commonly  organized  as  a  two  dimensional  grid.  Full  connec- 
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tions  from  the  input  layer  to  the  competitive  layer  are  enforced  and  the  connection  strengths  or  weights 
are  initialized  with  random  values.  The  general  architecture  of  a  Kohonen  network  is  shown  in  Fig.  2.9. 

Though  the  architecture  of  a  Kohonen  network  is  similar  to  that  of  a  competitive  learning  architec¬ 
ture,  the  self-organizing  process  or  learning  in  the  competitive  layer  is  accomplished  via  a  different 
criterion.  Firstly,  a  matching  value  that  measures  the  closeness  of  a  weight  vector  of  each  unit  in  the 
competitive  layer  with  the  corresponding  vector  of  input  pattern  is  calculated  by 

|P-W,I  =  [£(pj-w,j)2|i 
j 

in  which,  P  is  the  input  vector  containing  the  activations  of  the  input  nodes,  and  Wj  is  the  weight  vector 
of  unit  i  in  the  competitive  layer  with  connections  to  all  the  nodes  in  the  input  layer.  The  winning  node 
is  identified  with  the  minimum  matching  value.  After  identifying  the  winning  node,  the  next  step  is  to 
select  the  neighborhood  of  nodes  around  the  winner  for  weight  updating.  Only  the  weights  of  those 
nodes  in  the  winning  neighborhood  are  modified  with  the  following  equation: 

{a(pj  -  Wjj)  if  unit  i  is  in  the  winning  neighborhood 
0  otherwise 

where  a  is  the  learning  rate.  Usually,  the  learning  rate  decreases  as  the  training  proceeds.  On  the  other 
hand,  the  size  of  the  winning  neighborhood  in  the  competitive  layer  can  be  given  a  relative  large  width 
initially,  and  then  reduce  the  size  with  further  training. 

It  is  clear  that  a  Kohonen  network  performs  a  feature  mapping  between  the  input  pattern  and  the 
representing  weight  vector  and  identifies  the  input  pattern  with  activation  pattern  in  the  competitive 
layer.  The  feature  mapping  capability  makes  this  network  suitable  for  applications  in  sensory  motor 
control,  language  processing  and  constraint  optimization. 
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2.7.5  Other  Unsupervised  Learning  Models 


There  are  many  other  unsupervised  learning  models  such  as  ART  -  the  Adaptive  Resonance 
Theory  proposed  by  Grossberg  and  Carpenter  (1976, 1987, 1988, 1990),  which  is  a  two  layer,  nearest- 
neighbor  classifier  that  stores  an  arbitrary  number  of  spatial  patterns  using  competitive  learning; 
BAM  -  the  Bidirectional  Associative  Memory  introduced  by  Kosko  (1987, 1988),  using  Hebbian  learn¬ 
ing  to  encode  arbitrary  spatial  pattern  pairs  in  a  two  layer,  heteroassociative  pattern  matcher;  and 
FAM  -  Fuzzy  Associative  Memory  also  proposed  by  Kosko  (1987).  Another  interesting  architecture  is 
Hecht-Nielsen’s  Counterpropagation  Network  (1987).  A  counterpropagation  network  is  actually  a 
hybrid  three  layer  network,  in  which  the  hidden  layer  is  a  Kohonen  competitive  layer  with  unsuper¬ 
vised  learning  and  the  rest  are  standard  backpropagation  layers  trained  with  the  Generalized  Delta 
Rule.  Due  to  time  limitation,  we  cannot  describe  all  the  other  networks  in  this  survey.  The  following 
list  of  references  will  provide  pointers  to  recent  development  in  unsupervised  learning  paradigms  and 
related  works. 
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3  ENGINEERING  APPLICATIONS 


3.1  General 

Tb  date,  the  resurgence  of  research  in  neural  networks  not  only  has  advanced  the  technology  in 
some  branches  of  networks  and  resulted  in  sophisticated  modeling  tools  but  also  has  generated  vari¬ 
ous  applications  in  many  disciplines.  In  addition  to  intensive  research  and  application  in  the  tradi¬ 
tional  areas  such  as  the  modeling  of  cognitive  process,  pattern  recognition,  and  language  processing,  a 
strong  trend  is  seen  in  the  application  of  neural  networks  to  real  engineering  problems.  As  a  by-prod¬ 
uct  of  this  endeavor,  many  innovative  engineering  approaches  have  also  been  introduced  to  the  devel¬ 
opment  of  neural  network  based  modeling  systems. 

Because  of  the  characteristics  associated  with  neural  network  modeling,  there  are  several  kinds  of 
engineering  problems  that  are  suitable  for  this  technology.  As  a  knowledge  representation  tool,  neural 
networks  can  be  used  in  modeling  the  behavior  of  engineering  material  in  structural  engineering  and 
computational  mechanics,  system  identification,  control,  and  prediction.  As  a  computational  tool, 
Hopfield-type  networks  have  found  extensive  applications  in  planning,  scheduling,  and  optimization. 
As  a  classifier  and  pattern  matcher,  neural  networks  can  provide  an  alternative  in  solving  problems 
associated  with  pattern  recognition,  diagnostics,  maintenance,  and  image  processing. 

It  should  be  realized  that  neural  networks  are  only  a  tool  for  some  specially  suited  problems,  and 
there  is  still  a  long  way  to  go  before  this  technology  becomes  a  sophisticated  entity  in  the  tool  box  for 
engineers.  The  current  application  of  neural  networks  is  more  of  an  art  than  a  science  because  the 
implementation  process  involves  a  lot  of  heuristics  and  engineering  judgement.  The  success  of  appli¬ 
cation,  in  some  ways,  depends  on  the  modeler’s  understanding  of  the  problem  and  the  selection  of 
certain  parameters  in  using  a  neural  network.  At  this  stage,  the  most  effective  use  of  this  technology 
may  rest  on  the  development  of  systems  combining  a  traditional  approach  with  this  technology. 

In  the  following  paragraphs,  some  of  the  applications  in  different  engineering  fields  are  described 
and  relevant  references  provided.  As  it  has  been  noted  before,  the  reference  list  is  by  no  means  exhaus¬ 
tive  and  we  owe  an  apology  to  those  researchers  whose  work  has  been  overlooked  due  to  time  limits  on 
preparing  this  report. 

3.2  Hybrid  Systems  and  Their  Applications 

As  mentioned  before,  neural  networks  reside  on  the  middle  ground  between  a  pure  mathematical¬ 
ly  based  engineering  approach  and  the  symbolic  dominant  AI  approach.  A  system  combining  the 
computational  capability  of  neural  networks  and  the  deep  reasoning  ability  of  KBES  (Knowledge- 
based  Expert  Systems)  would  most  likely  offer  new  insight  and  powerful  tools  in  engineering  problem 
solving.  The  objective  of  building  hybrid  expert  systems,  that  is,  integrating  neural  networks  and  ex¬ 
pert  systems,  is  to  explore  the  advantages  and  neutralize  the  disadvantages  of  both  systems. 

In  a  hybrid  expert  system,  neural  networks  usually  function  as  a  classifier  for  data  evaluation,  regu¬ 
larity  detection  and  classification  or  as  an  optimizer  for  solving  a  multiconstraint  optimization  prob- 


-  66  - 


lem  in  KBES.  In  the  last  few  years,  hybrid  systems  have  been  successfully  constructed  in  planning, 
scheduling,  diagnosis,  and  decision  making.  In  general,  as  summarized  by  Rabelo,  Alptekin  and  Kiran 
(1990),  the  integration  of  Neural  Networks  and  KBES  takes  these  forms:  1)  Neural  Networks  are  used 
for  knowledge  representation,  and  the  represented  knowledge  is  then  translated  to  rules  to  a  KBES  for 
symbolic  manipulation,  2)  a  KBES  is  used  to  obtain  a  preliminary  solution,  which  is  then  optimized  by 
Neural  Networks,  and  3)  Neural  Networks  are  used  within  a  KBES  to  perform  tasks  that  explicit  rules 
would  be  too  complex  to  build. 

There  are,  of  course,  many  more  variations  on  the  design  of  a  hybrid  system.  In  the  following 
paragraphs,  some  of  the  typical  approaches  on  the  construction  of  a  hybrid  system  are  described  and 
an  overview  of  their  applications  as  well  as  references  on  the  development  of  Connectionist  Expert 
Systems  (CES)  are  also  provided.  It  should  be  pointed  out  that  CES  are  different  from  the  simple 
integration  of  KBES  with  neural  networks,  rather  that  CES  are  standalone  systems  capable  of  rule 
extraction  and  generalization  within  themselves  (Gallant,  1988).  Some  research  works  on  novel  ap¬ 
proaches  to  rule  extraction  from  connectionist  systems  are  also  included  for  reference  purpose. 

3.2.1  The  Three-Stage  Integration  -  Hillman  (1990) 

According  to  Hillman  (1990),  three  stages  are  involved  in  building  a  hybrid  expert  system: 

1.  The  data  and  information  obtained  are  preprocessed  by  the  expert  system; 

2.  The  preprocessed  data  are  filtered  through  a  neural  networks  for  evaluation,  regularity  detec¬ 
tion  and  classifications; 

3.  Results  from  the  neural  network  are  analyzed  and  synthesized  by  the  expert  system. 

The  advantage  of  using  a  neural  network  as  a  data  evaluator  and  regularity  detector  is  the  simplifi¬ 
cation  of  the  rule  building  process  in  data  evaluation,  reducing  the  execution  time.  It  is  apparent  that 
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software  tools  for  interface  are  of  primary  importance  in  the  integration  process  of  the  hybrid  system. 
In  Hillman’s  toy  problem,  two  commercially  available  packages  -  AUBREY  and  NeuroShell  (Ward 
Systems  Group,  Inc.),  were  used.  This  approach  is  illustrated  in  Fig.  3.1. 

3.2.2  TVvo-Stage  Integration  -  Benachenhou,  et  al.  (1990) 

Similar  to  the  three-stage  approach,  the  two-stage  approach  usually  takes  the  following  forms:  1)  a 
neural  network  works  as  a  preprocessor  for  an  expert  system,  and  2)  another  neural  network  works  as  a 
postprocessor  on  results  from  the  expert  system. 

In  this  paper,  a  hybrid  system  consisting  of  a  feature-based  knowledge  system  and  an  ART1  (Car¬ 
penter  and  Grossberg,  1987)  network  is  used  for  the  inverse  problem  of  image  processing,  that  is,  sort¬ 
ing  images  by  clustering  instead  of  extracting  features  from  images  by  clustering  in  the  direct  image 
processing  problem.  The  knowledge  based  system  provides  a  training  environment  by  giving  a  few 
known  features  of  different  images  and  the  ART1  network  then  sorts  those  images  into  unknown  num¬ 
bers  of  classes  by  using  its  clustering  capability.  The  advantage  of  using  ART1  is  that  it  can  work  on  an 
open  set  of  samples,  whereas  for  the  Kohonen  network  the  number  of  clustering  groups  should  be 
known  a  priori.  Thus  the  system  can  be  used  to  classify  unfamiliar  images  into  new  classes. 

The  system  is  applied  to  clustering  a  small  set  of  primers  among  a  large  open  set  generated  by  a 
rule  based  system  and  uses  the  sorted  results  in  the  diagnosis  of  AIDS  virus-mutated  DNA  by  a  re¬ 
combinant  DNA  technology  called  Polymerase  Chain  Reaction  (PCR).  In  the  test  results,  the  hybrid 
architecture  was  able  to  select  the  leaders  of  image  clustering,  and  the  system  is  currently  evaluated 
under  practical  medical  conditions. 
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3.2.3  Flexible  Manufacturing  Systems  -  Rabelo,  et  al.  (1990) 

According  to  the  authors’  definition,  flexible  manufacturing  systems  (FMS)  are  automated  man¬ 
ufacturing  systems  consisting  of  numerical  control,  machine  tools,  material  handling  devices,  auto¬ 
mated  inspection  stations,  in-process  storage  areas,  and  a  computational  scheme  to  provide  database 
handling,  supervisory,  and  monitoring  functions.  A  hybrid  system  that  integrates  neural  networks  and 
KBES  is  proposed  to  solve  the  real  time  scheduling  of  a  flexible  manufacturing  system.  Feedforward 
neural  networks  are  used  as  prediction  tools  and  scheduling  pattern  recognition  mechanism,  and 
KBES  are  utilized  as  the  higher  order  members  that  interact  with  other  elements  of  the  FMS  hierarchy 
providing  guidance  for  problem  solving  strategy,  monitoring  the  performance  of  the  system,  and  auto¬ 
mating  the  neural  networks  learning  process.  The  architecture  of  the  system  are  schematically  shown 
in  Fig.  3.2. 

It  appears  that  the  architecture  proposed  in  the  article  has  direct  applicability  to  scheduling  and 
planning  problems  in  construction  engineering.  The  question  is  how  effective  the  learning  units  will  be 
on  large  complex  data  sets  and  how  efficient  the  interfaces  between  expert  system  and  neural  networks 
are  in  real  application. 

3.2.4  Task  of  Ordering  -  Becker  and  Peng  (1987) 

This  paper  discusses  the  use  of  activation  networks  for  analogical  reasoning  in  the  task  of  ordering 
the  alternatives.  The  scheme  for  integrating  the  activation  networks  with  a  KBES  for  symbolic  pro¬ 
cessing  is  also  outlined.  The  activation  network  is  designed  to  represent  analogical  reasoning  for  prob¬ 
lem  solving.  There  are  three  layeis  of  nodes  in  the  network  and  the  characteristics  of  each  layer  are  as 
follows:  1)  the  input  layer  represents  problem  attributes,  2)  the  hidden  or  middle  layer  represents  old 
solutions,  and  3)  the  output  layer  represents  choice  alternatives.  Connections  and  connection  strength 
between  input  node  to  hidden  nodes  are  established  if  the  input  attribute  contributes  to  the  solutions 
in  the  hidden  layer,  and  the  connection  strengths  between  the  hidden  layer  to  the  output  layer  are 
assigned  identity  values.  Learning  schemes  such  as  the  parameter-adjusting  learning  can  be  used  to 
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adjust  the  weight  between  input  and  hidden  layer.  No  learning  example  is  shown  in  the  paper.  The 
integration  of  an  activation  network  with  an  expert  system  can  be  schematically  shown  in  Fig.  3.3. 

3.2.5  Delivery  TVuck  Dispatching  -  Bigus  and  Goosbey  (1990) 

This  article  describes  the  application  of  a  self-organizing  network  with  database  and  knowledge- 
based  systems  to  solve  the  problem  of  dispatching  delivery  trucks  under  weight  and  volume  con¬ 
straints  to  minimize  the  number  of  trucks  required  and  the  total  distance  each  truck  must  travel.  The 
problem  solving  process  involves  four  steps:  1)  reading  the  data  from  a  customer  and  delivery  database 
and  determining  the  minimum  number  of  trucks  required  from  KnowledgeTbol  -  a  rule-based  expert 
system,  2)  using  KnowledgeTbol  rules  for  the  initial  assignment  of  deliveries  to  trucks,  3)  using  Know- 
ledgeTool  to  improve  the  assignments  by  swapping  deliveries  between  trucks  to  reduce  the  travelling 
distance,  and  4)  solving  each  truck’s  delivery  route  using  a  variation  of  elastic  net  proposed  by  Ange- 
niol,  et  al.  (1988)  based  on  feature  maps.  The  problem  solving  process  is  shown  «n  Fig.  3.4. 

3.2.6  Object  Recognition  in  Image  Processing  -  Glover,  et  al.  (1990) 

This  article  describes  a  hybrid  system  for  object  recognition  in  image  processing.  The  system 
which  is  composed  of  neural  networks  and  a  rule-based  pattern  recognition  system,  is  capable  of  self¬ 
modification  or  learning  through  a  feedback  loop  between  the  neural  networks  and  the  rule-based 
system.  Thus  the  neural  networks  can  be  automatically  trained  and  modified  by  the  rule-based  sys¬ 
tem,  and  the  rule-based  system  can  modify  models  in  this  knowledge  base  from  information  supplied 
by  the  neural  networks.  The  schematic  diagram  of  the  hybrid  system  architecture  is  shown  in  Fig.  3.5. 
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3.2.7  Manipulator  Control  System  -  Handelman,  Lane  and  Gelgand  (1990) 

This  article  describes  a  hybrid  system  combining  neural  networks  and  rule-based  systems  for  ro¬ 
bot  control  problems.  Like  the  system  proposed  by  Glover,  et  al.  (1990),  the  rule-based  system  inter¬ 
acts  with  and  monitors  the  learning  and  performance  of  the  neural  network  module.  The  training  of 
the  network  can  then  be  completed  on  line  so  an  autonomous  learning  system  is  defined.  The  system 
works  in  the  following  way:  1)  Initially,  a  rule-based  system  is  used  to  come  up  with  acceptable  first-cut 
solutions  to  the  given  control  objectives,  2)  The  rule-based  system  then  teaches  a  neural  network  how 
to  accomplish  parts  of  the  learning  task,  and  3)  After  that,  the  rule-based  system  interacts  with,  moni¬ 
tors  the  neural  network  operation,  and  re-engages  task  execution  and  training  rules  whenever  changes 
in  operating  conditions  degrade  network  performance.  The  schematic  architecture  of  the  control  sys¬ 
tem  for  teaching  a  two-link  manipulator  to  make  a  tennis-like  swing  is  shown  in  Fig.  3.6. 

3.2.8  Waste  Water  Treatment  Sequence  Processing  -  Krowidy  and  Wee  (1990) 

This  article  describes  a  system  for  constructing  waste  water  treatment  sequences  for  the  treatment 
of  several  compounds  by  reducing  the  concentration  levels  of  the  chemicals.  The  objective  of  the  work 
is  to  extract  the  information  from  an  existing  database  in  the  form  of  a  collection  of  expert  system  rules 
and  use  these  rules  to  come  up  with  the  treatment  train.  The  system  consists  of  two  phases:  the  analysis 
phase  and  synthesis  phase.  The  expert  system  rules  obtained  in  the  analysis  phase  are  developed  using 
an  inductive  algorithm  and  the  treatment  train  is  determined  in  the  synthesis  phase  using  a  Hopfield 
network.  The  system  is  schematically  shown  in  Fig.  3.7. 
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3.2.9  Synthetic  Organic  Chemistry  -  Luce  and  Govind  (1990) 

This  system  can  be  called  a  hybrid  system  but  not  in  the  sense  used  here,  because  it  is  basically  a 
compound  system  with  several  neural  networks  carrying  out  different  operations.  What  we  are  really 
looking  at  the  hybrid  system  for  is  the  integration  of  neural  networks  with  knowledge  based  or  tradi¬ 
tional  systems  instead  of  simply  putting  some  neural  networks  (may  be  of  different  tyr  2s)  bundled 
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together,  though  such  bundled  systems  are  also  of  significance  in  neural  network  research  and  applica¬ 
tions. 

In  this  paper,  different  neural  networks  are  designed  for  the  pattern  recognition  of  molecular  sub¬ 
groups  in  organic  molecules  in  producing  and  devising  creative  syntheses,  and  each  network  recog¬ 
nizes  one  set  of  disconnection  type.  The  bundled  system  design  was  inspired  by  Minsky’s  “society  of 
mind.”  The  system  design  is  shown  in  Fig.  3.8. 

3.2.10  Fault  Diagnosis  -  Yamamoto  and  Venkatasubramanian  (1990) 

This  article  proposed  a  novel  approach  to  the  fault  diagnostics  problems  especially  with  multiple 
faults.  The  conventional  neural  network  approach  to  fault  diagnosis  solely  uses  the  feedforward  map¬ 
pings  in  an  open  loop,  whereas  the  approach  proposed  here  consists  of  two  mapping  operations, 
namely,  feedforward  mapping  and  inverse  mapping.  The  inverse  mapping  networks  give  verification 
to  the  results,  provide  credibility  to  the  output  values  of  the  forward  mapping  networks,  and  reduce  the 
ambiguity  in  generalization.  The  system  consists  of  three  main  components:  quantitative  neural  net¬ 
works  (QTN),  qualitative  neural  networks  (QLN),  and  inverse  qualitative  neural  networks  (IQLN). 
Each  module  is  also  comprised  of  multiple  networks  with  the  same  structure,  and  there  are  eight  sub¬ 
networks  in  QTN,  five  in  both  QLN  and  IQLN,  respectively.  All  three  networks  are  feedforward  back- 
propagation  networks  with  one  hidden  layer.  The  general  architecture  of  the  system  is  shown  in  Fig. 
3.9. 


The  performance  of  the  system  was  evaluated  through  testing  on  the  following  four  fault  cases  in  a 
chemical  plant  model:  1)  single-fault  cases,  2)  two-fault  cases  I,  where  one  fault  resides  in  the  reactor 
and  another  in  the  distillation  column,  3)  two-fault  cases  II,  where  both  faults  are  within  the  reactor, 
and  4)  sensor  fault  cases.  The  performance  of  the  system  is  reasonable  in  that  it  identified  all  the  novel 
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testing  single-fault  cases,  two-fault  cases  I,  and  the  sensor-fault  cases.  For  two-fault  cases  II,  half  of 
the  test  patterns  were  correctly  identified. 
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3.2.11  Other  Applications 

There  are  many  other  applications  that  use  a  hybrid  system  in  the  problem  solving  process.  Hy¬ 
brid  systems  have  been  proposed  and  used  for  many  suitable  problems  in  practice,  such  as  the  moni¬ 
toring  and  diagnostics,  process  control  and  optimization. 

For  diagnostic  problems,  Casselman  and  Acres  (1990)  use  several  neural  networks  in  a  large  diag¬ 
nostic  system  -  the  DASA/LARS,  on  monitoring  and  diagnosing  spectrum  anomalies  associated  with 
the  Frequency  Multiple  Access  satellite  communication  networks.  The  Neural  Networks  are  trained 
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on  live  sensor  information  in  an  operational  environment.  For  medical  diagnostics,  Saito  and  Nakano 
(1988)  built  a  prototype  medical  diagnostic  expert  system  based  on  a  three  layer  backpropagation  net¬ 
work  trained  on  symptoms  and  diagnosis  cases  of  about  300  patients.  The  system  maps  headaches  as 
the  only  symptoms  to  23  diseases.  The  input  layer  represents  answers  to  each  question  of  the  230  on  a 
questionnaire.  A  scheme  is  also  proposed  to  extract  symbolic  knowledge  from  the  diagnostic  results  of 
the  neural  networks  and  compare  with  doctor’s  knowledge.  In  veterinary  medicine,  a  system  for  diag¬ 
nosis  of  mastitis  in  dairy  cows  is  proposed  by  integrating  a  production  system  module,  a  neurai  net¬ 
work  simulation  module  and  a  knowledge  acquisition  module  (Schreinemakers  and  Tburetzky,  1990). 
The  production  system  is  OPS5  and  all  three  modules  communicate  via  OPS5  working  memory  ele¬ 
ments.  The  performance  of  the  system  on  the  diagnosis  of  mastitis  in  a  limited  set  of  experimental  data 
showed  excellent  accuracy.  In  engineering  facility  management,  monitoring  and  diagnostics  are  an 
integrated  set.  Tsoukalas  and  Reyes-Jimenes  (1990)  proposed  a  prototypic  system  for  the  monitoring 
and  diagnosis  of  a  nuclear  plant  model.  A  backpropagation  neural  network  with  one  hidden  layer  is 
used  to  capture  the  correlation  between  sensor  signals  and  the  working  state  of  different  units.  The 
rule-based  system  is  used  to  interpret  the  results  from  the  neural  network  and  to  make  decisions  and 
control  the  operations  of  working  units. 

Tsutsumi  (1989)  has  done  a  series  of  work  in  robot  and  position  control.  Though  his  system  does 
not  fit  in  the  pure  mode  of  hybrid  systems,  his  approach  is  worth  mentioning.  The  article  describes  a 
system  consisting  of  two  backpropagation  networks  and  a  Hopfield  network  for  applications  in  ma¬ 
nipulator  control.  The  input  signals  from  the  environment  are  mapped  via  a  backpropagation  network 
into  the  internal  space,  where  a  Hopfield  net  minimizes  the  total  energy  according  to  the  internal  space 
representation.  The  output  signals  of  the  Hopfield  Net  are  then  mapped  into  the  environment  via 
another  backpropagation  net  with  the  inverse  mapping.  Simulation  studies  on  manipulator  configura¬ 
tion  control  how  the  proposed  system  helps  the  manipulator  to  reach  the  target  point  through  the 
shortest  path  in  the  internal  space.  It  should  be  pointed  out  that  a  similar  approach  on  adding  an 
inverse  mapping  in  the  control  loop  was  also  proposed  in  Yamamoto  and  Venkatasubramanian’s 
study  (1990)  on  a  chemical  plant  control  model. 

Though  the  neural  network  approach  to  planning  is  discussed  in  another  section  in  this  chapter, 
Veezhinathan  and  McCormick’s  work  on  plan  reminding  (1988)  is  interesting.  As  defined  in  the  ar¬ 
ticle,  plan  reminding  is  the  process  by  which  we  are  reminded  of  a  plan  or  a  set  of  plans  to  achieve  a 
given  goal  or  a  combination  of  goals  by  taking  into  account  certain  familiar  constraints  automatically. 
The  characteristics  of  plan  reminding  is  as  follows:  1)  it  is  indexed  not  only  by  goals,  but  also  depends 
critically  on  the  context  in  which  the  goals  occur,  2)  constraint  satisfaction  is  an  important  consider¬ 
ation,  and  3)  it  may  involve  inference.  This  article  describes  a  prototypic  connectionist  model  for  the 
task  of  errand  planing  in  plan  reminding  within  a  production  system. 
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3.2.12  Studies  on  Connectionist  Expert  Systems  and  Related  Works 


Some  woi  ks  concerning  the  methodology  in  building  connectionist  expert  systems  and  schemes  for 
rule  extraction,  concept  mapping,  generalization,  etc.,  are  also  listed  in  the  following  references. 

For  example,  Bochereau  and  Bourgine  (1990)  discussed  the  extraction  of  logical  rules  from  a  multi¬ 
layer  neural  network  through  building  a  validity  domain  for  the  network.  The  work  is  of  significance  in 
theoretical  study  in  building  real  neural  expert  systems  with  explicit  knowledge  about  the  internal  rea¬ 
soning.  However,  its  applicability  is  still  an  open  question  as  no  case  studies  were  given  in  the  article. 
In  Miller,  Roysam  and  Smith’s  work  (1988),  a  general  method  for  mapping  a  large  class  of  rule-based 
constraints  to  their  equivalent  stochastic  Gibbs’  distribution  representation  was  proposed.  This  map¬ 
ping  thus  makes  it  feasible  to  solve  stochastic  estimation  problems  over  rule-generated  constraint 
space  within  a  Bayesian  framework.  The  algorithm  was  also  tested  on  a  image  reconstruction  prob¬ 
lem. 

In  addition  to  Gallant’s  (1988)  seminal  work  on  the  construction  of  connectionist  expert  systems, 
new  schemes  have  also  been  proposed  in  that  respect,  such  as  the  work  by  Yang  and  Bhargava  (1990), 
Touretzky,  et  al.  (1986, 1987, 1990),  Samad  (1988)  and  Fu  (1989).  Some  works  also  cover  the  integration 
of  fuzzy  logic  with  neural  networks  (Kosko,  1987;  Romaniuk  and  Hall,  1990). 
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3.3  Civil  Engineering  Applications 

The  application  of  neural  networks  in  civil  engineering  ranges  from  modeling  of  material  behav¬ 
iors  from  experimental  tests,  damage  assessment  of  structural  systems,  structural  analysis  and  opti¬ 
mum  design,  to  ground  water  modeling.  Research  works  in  this  area  are  still  exploratory  and  the  ques¬ 
tion  is  on:  1)  the  definition  of  the  specific  application  domains,  namely,  in  what  subdiscipline  or  sub- 
area  neural  networks  would  most  probably  provide  advantage  and  benefits  over  the  current  approach? 
2)  what  kind  of  network  is  most  suitable  for  problems  in  civil  engineering?  The  examples  shown  in  the 
following  paragraphs  will  provide  some  test  cases  for  judging  the  pros  and  cons  for  neural  network 
approaches  in  this  field. 

3.3.1  Modeling  of  the  Behaviors  of  Engineering  Materials 

The  application  of  neural  networks  to  material  modeling  was  initiated  in  the  Department  of  Civil 
Engineering  at  the  University  of  Illinois.  The  focus  of  this  research  group  is  on  the  modeling  of  behav¬ 
iors  of  engineering  materials  such  as  concrete,  reinforced  concrete,  geo-materials,  as  well  as  compos¬ 
ites,  the  assessment  of  structural  member  damage  in  a  structural  system,  and  the  classification  of 
groundwater  transmissivity  fields  for  use  in  the  design  of  groundwater  contamination  remediation.  In 
material  modeling,  the  behavior  of  a  material  under  different  stress  states  determined  from  experi¬ 
ments  is  represented  as  a  kind  of  knowledge  in  a  neural  network. 
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The  paper  published  in  the  Proceedings  of  NUMETA-90  (Ghaboussi,  et  al.  1990)  described  a  neu¬ 
ral  network  approach  to  the  modeling  of  engineering  material,  specifically  plain  concrete.  The  com¬ 
plex  behavior  of  concrete  material  in  biaxial  stress  states  under  tension-tension,  tension-compression 
and  compression-compression  monotonic  loading  was  modeled  by  a  backpropagation  neural  net¬ 
works  with  two  hidden  layers.  The  neural  network-based  model  is  stress  controlled,  that  is,  it  predicts 
strain  increments  from  information  on  the  current  stress-strain  states  and  stress  increments  on  a 
stress  path.  The  neural  network  concrete  model  learned  the  behavior  with  reasonable  accuracy  and  its 
predictions  on  untrained  stress  paths  were  on  par  with  those  predicted  from  analytical  models. 

The  article  in  the  ASCE  journal  (Ghaboussi,  et  al.  1991)  formally  proposed  a  neural  network- 
based  material  modeling  methodology  for  engineering  materials  with  complex  mechanical  behavior. 
Behaviors  of  plain  concrete  in  biaxial  stress  states  under  monotonic  loading  and  those  in  uniaxial 
stress  state  under  cyclic  compressive  loading  were  modeled  in  backpropagation  neural  networks.  A 
“3-point  scheme”  was  used  to  represent  the  history  dependency  of  material  behavior  under  cyclic 
loading.  Comprehensive  testing  has  been  carried  out  to  verify  the  neural  network-based  models  with 
additional  experiments  and  analytic  models  based  on  principles  of  solid  mechanics.  Implication  of  the 
network-based  modeling  methodology  to  the  difficult  problem  of  composite  material  modeling  was 
also  outlined. 
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3 3.2  Structural  Analysis  and  Design 
3.3.2.1  Deb’s  Approach  (1990) 

The  design  of  a  welded  beam  involves  the  determination  of  a  feasible  set  of  geometrical  parameters 
for  the  weld  and  the  beam  under  a  concentrated  load  at  the  free  end,  which  the  system  subjects  to 
constraints  on  the  shear  stress  in  the  weld,  bending  stress  in  the  beam,  buckling  loading,  beam  end 
displacement,  and  the  width  of  the  weld.  Genetic  Algorithms  (GAs)  are  used  to  solve  the  problem  with 
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four  design  parameter.*-  ^nd  five  sets  of  constraints.  Performances  of  GAs  with  three  population  sizes 
(100, 50, 200)  and  corn,  .ponding  probabilities  of  crossover  and  mutation  are  compared  with  tradition¬ 
al  optimization  method.  The  final  results  are  very  reasonable  and  the  maximum  error  for  the  popula¬ 
tion  of  100  case  is  about  3  percent. 

The  parameters  used  in  the  GA  are:  population  size  (100, 50, 200),  string  length  (40),  sub-string 
length  for  each  parameter  (10),  probability  of  cross  over  (0.9, 0.5-1.0),  and  probability  of  mutation 
(reciprocal  of  the  population  size).  Because  a  GA  only  needs  to  search  for  an  optimal  solution  in  a 
subspace  of  the  feasible  space,  GAs  look  very  promising  in  structural  optimization  problems.  On  the 
other  hand,  a  genetic  search  procedure  is  implicitly  parallel  and  thus  provides  fast  searching  capabili¬ 
ties  especially  on  parallel  machines. 

3J.2.2  Hajela  and  Berke’s  Approach  (1990) 

In  structural  design,  major  computation  is  on  the  analysis  of  the  structural  behavior  with  different 
sets  of  design  parameters  under  designated  loading.  In  this  paper,  neural  network  models  are  used  to 
replace  the  structural  analysis  module  in  a  nonlinear-programming-based  optimization  environment. 
The  feedforward  backpropagation  network  and  the  functional-link  net  were  used  to  capture  the  load- 
displacement  relationships  in  static  structural  analysis  in  the  minimum  weight  design  of  a  five-bar 
truss,  a  ten-bar  truss,  a  wing-box  structure,  with  constant  nodal  loading  and  constraints  on  maximum 
nodal  displacements  or  axial  stresses  or  both. 

From  network  performance  study,  it  was  found  that  the  functional-link  net  was  not  very  effective  in 
highly  nonlinear  mapping  of  the  load-displacement  relation.  The  identification  of  proper  input  en¬ 
hancement  would  be  the  key  for  the  success  of  functional-link  net,  and  this  task  is  not  easy  to  achieve. 
There  were  16  design  variables  and  40  design  constraints  on  both  displacements  and  stresses  for  the 
wing-box  structure,  and  the  training  sets  covered  possible  lower  and  upper  bounds  of  each  design  vari¬ 
able.  For  each  design  case,  each  input  node  represents  a  design  variable  (cross-sectional  area)  and 
each  output  node  represents  a  nodal  displacement.  It  appeared  that  the  neural  networks  achieved  the 
near-optimum  design  for  each  structure. 

In  our  opinion,  the  advantage  of  this  approach  is  the  fast  mapping  of  load-displacement  relation 
after  training.  The  pitfall  lies  in  the  following  aspects:  1)  the  structural  geometry  is  fixed,  i.e.,  there 
needs  be  a  neural  networks  for  each  structural  configuration,  and  2)  a  large  number  of  .‘raining  sets  are 
needed  to  cover  the  lower  and  upper  bounds  for  each  design  variables.  For  large  real  structures,  it 
would  be  too  expensive  to  generate  all  the  training  data.  A  comparison  with  genetic  algorithms  may 
shed  some  light  on  this  problem  because  only  a  small  subspace  of  the  training  domain  is  needed  with 
genetic  searching  algorithms  (Goldberg,  1989). 

3J.2.3  Mcaulay’s  Approach  (1987) 

This  paper  also  addressed  the  application  of  a  backpropagation  network  in  structural  design  and  a 
new  learning  algorithm  called  “split  inversion  learning.”  The  basic  approach  of  this  algorithm  is  to 
compute  the  weights  for  the  output  and  hidden  layers  separately  so  that  the  error  at  the  output  layer  is 
minimized.  In  structural  design,  an  inverse  problem  to  structural  analysis  is  solved  by  a  backpropaga- 
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tion  network  to  model  the  displacement-load  relation  under  varying  magnitudes  of  loadings.  For  a 
static  linear  beam-truss  problem,  the  loads  and  controlling  displacements  are  represented  as  input 
information,  and  the  design  variables  such  as  the  dimension  of  each  structural  member  as  output  in¬ 
formation.  No  numerical  solution  was  shown  in  the  paper. 

Further  study  needs  to  be  done  on  the  proposed  new  learning  algorithm  before  it  becomes  a  viable 
one.  As  discussed,  only  if  the  number  of  data  sets  is  equal  to  the  dimension  of  the  output  layer,  can  the 
update  of  weights  be  obtained  from  solving  a  linear  system  of  equations.  In  this  case,  two  systems  of 
linear  equations  corresponding  to  that  from  input  layer  to  output  layer  and  that  from  hidden  layer  to 
output,  need  to  be  solved.  If  the  condition  is  not  satisfied,  or  if  the  number  of  training  sets  is  larger  than 
the  dimension  of  the  output  layer,  which  is  the  usual  case  in  practice,  then  a  linear  squares  problem 
must  be  solved  instead.  This  made  the  learning  algorithm  awkward  and  inefficient.  On  the  other  hand, 
the  remarks  suggesting  the  conjugate  gradient  method  for  the  ill-conditioned  linear  system  of  equa¬ 
tions  were  misleading  because  conjugate  gradient  even  with  simple  preconditioner  is  also  ineffective 
for  those  highly  ill-conditioned  systems. 

3  J.2.4  Rehak,  Thewalt,  and  Doo’s  Approach  (1989) 

This  paper  is  probably  the  first  one  addressing  the  application  of  neural  networks  to  structural 
mechanics  in  civil  engineering.  By  considering  its  summation  rules  for  a  neuron  in  a  neural  network,  a 
neural  model  of  spring  structure  is  construed  as  computational  elements  in  structural  mechanics,  and 
its  use  in  system  identification  computation  for  a  dynamic  system,  specifically  a  one  degree  of  freedom 
oscillator  with  viscous  damping,  was  illustrated.  Since  this  approach  was  basically  drawn  from  the 
analogy  of  mapping  properties  associated  with  linear  neural  networks  and  the  solution  procedure  in 
linear  systems,  the  applicability  of  the  approach  is  limited  and  does  not  offer  any  improvement  on  the 
current  approach  to  structural  system  analysis.  In  spite  of  this,  the  major  contribution  of  this  paper 
lies  in  its  realization  of  possible  impacts  of  neural  networks  to  system  identification  computation, 
which  is  also  of  significance  in  active  structural  control  and  structural  dynamics. 

From  critically  reviewing  this  article,  one  should  recognize  the  limited  scope  of  application  of  neu¬ 
ral  networks,  that  is,  neural  networks  should  be  used  in  those  areas  where  they  have  the  potential  of 
resulting  in  better  performance  and  improvement  over  the  currently  used  methodologies.  This  aspect 
needs  innovative  thinking  and  critical  reasoning. 
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3  .3 .3  Structural  Damage  Assessment  and  Fatigue  Prediction 
3  .3 .3.1  Garrett,  et  al.  (1990) 

The  paper  by  Garrett,  et  al.,  is  essentially  a  conglomerate  of  term  project  reports  produced  from 
Professor  Garrett’s  class  in  the  Department  of  Civil  Engineering  at  the  University  of  Illinois.  It  ad¬ 
dresses  the  following  prototypic  engineering  applications  of  neural  computing:  1)  an  adaptive  control¬ 
ler  for  building  thermal  mass  storage,  2)  an  adaptive  controller  for  adjustment  of  a  combine  harvester, 
3)  an  interpretation  system  for  non-destructive  testing  data  on  masonry  walls  for  damage  detection,  4) 
a  machining  feature  recognition  system  in  process  planning,  5)  an  image  processor  for  classifying  land 
features  from  satellite  images,  and  6)  a  system  for  designing  pumping  strategy  for  contaminated 
groundwater  remediation.  Backpropagation  networks  were  used  in  the  first  five  applications  and  the 
Hopfield  network  was  used  for  the  last  application.  Ail  the  results  reported  in  this  paper  are  explorato- 
y  and  preliminary,  and  further  success  rests  on  intensive  work  in  those  directions. 

3 J.3.2  Tlroudet  and  Merrill  (1990) 

This  paper  described  a  neural  network  approach  to  estimate  in  real-time  the  fatigue  life  of  me¬ 
chanical  components  in  the  Intelligent  Control  Systems  (ICS)  for  Reusable  Rocket  Engines  (RRE)  at 
the  NASA  Lewis  Research  Center.  This  fatigue  life  estimator  consists  of  two  functional  blocks:  a  pre¬ 
processor  and  a  neural  network.  The  function  of  the  preprocessor  is  to  identify  a  load  cycle  and  store 
the  extreme  load  values  in  a  shift-register  buffer,  which  is  then  directly  mapped  to  the  input  layer  of  a 
backpropagation  neural  network.  The  identification  of  a  load  cycle  is  based  on  the  Uniaxial  Local 
Strain  Approach  (Dowling,  1972;  Palmgren,  1945;  and  Miner,  1945). 

The  architecture  of  the  backpropagation  network  consists  of  15  nodes  in  the  input  layer,  100  and  50 
nodes  in  the  first  and  second  hidden  layers  respectively,  and  one  node  in  the  output  layer.  The  input 
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nodes  represent  the  content  of  the  shift-register  consisting  of  the  load  cycle  boundaries  and  the  peak- 
to-peak  transitions  of  the  previous  cycle-free  load  history,  and  the  output  represents  the  contribution 
to  the  fatigue  life  of  a  hysteresis  loop.  The  performance  of  the  network  based  fatigue  estimator  is 
reasonable  with  75  percent  of  the  estimated  values  within  a  factor  1.7  of  the  exact  data.  No  figures  in 
the  article  showed  the  training  and  testing  results. 

3JJJ  Wu,  Ghaboussi,  and  Garrett  (1991) 

A  neural  network  approach  has  been  proposed  to  the  assessment  of  structural  elements  damage  in 
a  structural  system  from  classifying  deviations  in  the  system  behavior.  From  structural  mechanics,  it  is 
realized  that  the  response  spectrum  of  a  damaged  structure  in  the  frequency  domain  would  differ  by 
certain  amount  from  that  of  the  intact  structure.  Therefore,  the  damage  of  a  structural  system  can  be 
identified  by  a  neural  network  if  the  network  is  trained  on  the  response  spectra  corresponding  to  dif¬ 
ferent  damage  states.  A  three  story  shear  building  was  used  in  this  study. 

The  approach  has  three  computational  steps:  1)  the  response  of  the  structure  under  seismic  excita¬ 
tion  was  determined  in  the  time  domain,  2)  this  response  is  then  transformed  into  a  response  spectrum 
in  the  frequency  domain  through  Fast  Fourier  Transformation  (FFT),  and  3)  the  normalized  spectrum 
is  then  used  for  training  a  backpropagation  network.  Different  architectures  with  one  or  two  hidden 
layer(s)  have  been  investigated,  and  it  was  found  that  the  architecture  had  little  effect  on  the  perform¬ 
ance  of  the  trained  network.  The  training  results  were  perfect,  and  the  test  results  on  untrained  cases 
were  reasonable.  Extended  work  is  on  the  damage  assessment  of  framed  structural  systems  such  as  an 
offshore  oil  tower. 
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3.3.4  Groundwater  Remediation 

Several  applications  of  neural  network?  in  groundwater  remediation  are  currently  being  investi¬ 
gated  at  the  Department  of  Civil  Engineering  at  the  University  of  Illinois,  using  multilayer  feedforward 
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and  Hopfield  type  networks.  They  considered  a  hydraulic  gradient  control  technique  to  determine  the 
optimal  pumping  strategy  for  groundwater  remediation  under  conditions  of  uncertainty.  Their  design 
method  adopted  a  stochastic  approach,  where  many  equally  likely  sets  of  the  uncertain  parameter  are 
included  simultaneously  in  the  design  model.  The  major  source  of  uncertainty  is  assumed  to  be  em¬ 
bedded  in  the  heterogeneity  of  the  hydraulic  conductivity  parameter.  Although  many  parameter  fields 
are  considered,  only  a  few  critical  fields  will  impact  the  fi.;al  design.  The  spatial  distribution  of  the 
hydraulic  conductivity  values  in  a  hydraulic  conductivity  field  determines  how  critical  that  field  is.  The 
first  application  of  a  neural  network  was  to  train  a  feed-forward  type  network  to  learn,  via  error  back- 
propagation,  the  association  between  a  hydraulic  conductivity  field  and  its  impact  on  the  design.  This 
network  then  classifies  a  large  set  of  hydraulic  conductivity  fields  according  to  their  level  of  critical¬ 
ness.  Promising  results  have  been  obtained  in  this  area  of  application.  The  trained  neural  network  will 
be  used  as  a  prescreening  tool,  looking  for  the  critical  hydraulic  conductivity  fields,  in  the  groundwater 
remediation  design  procedure.  The  pumping  strategy  for  hydraulic  gradient  control  is  determined  by 
solving  an  optimization  model.  The  second  application  of  the  neural  network  was  to  set  up  a  Hopfield 
style  network  to  solve  the  optimization  model  through  simultaneous  constraint  satisfaction.  This  ap¬ 
proach  enables  the  solution  to  optimization  models  with  three  components  of  pumping  cost:  cost  of 
installation,  cost  of  pump  machinery,  and  cost  of  pump  operation.  Traditional  linear  programming 
techniques  could  run  into  computational  complexities  in  this  case.  The  preliminary  results  show  that 
the  neural  network  approach  to  optimization  has  the  capability  to  converge  to  solutions  that  are  opti¬ 
mal  or  near-optimal. 
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3.4  System  Identification 


In  engineering  applications,  a  vital  part  of  analysis  is  to  model  and  estimate  the  behavior  of  a  sys¬ 
tem  from  observations  on  input-output  information.  There  are  usually  two  steps  in  the  estimation  of  a 
system  behavior:  functional  estimation  and  parameter  estimation.  In  a  conventional  sense,  system 
identification  is  concerned  with  the  determination  of  the  parameters  of  a  system  after  assuming  a  sys¬ 
tem  function  with  unknown  parameters  is  known.  According  to  Zadeh  (1962),  the  system  identifica¬ 
tion  problems  can  be  defined  as  “the  determination,  on  the  basis  of  input  and  output,  of  a  system  with  a 
specified  class  of  systems,  to  which  the  systems  under  test  is  equivalent." 

There  are  many  ways  to  represent  a  system  identification  problem.  In  electrical  control  engineer¬ 
ing,  the  early  approach  to  system  identification  was  concerned  with  the  determination  of  the  transfer 
function  of  a  system.  The  transfer  function  could  be  determined  by  applying  a  known  input  signal  to 
an  assumed  system,  measuring  the  response  of  the  system,  and  fine  tuning  the  system  parameters  until 
the  expected  response  was  produced.  This  kind  of  approach  is  unsuitable  for  nonlinear  systems  or 
systems  with  measurement  and  process  noises.  Though  the  system  parameters  for  a  linear  system  can 
be  determined  in  one  step  by  solving  a  least  squares  problem,  the  determination  of  system  parameters 
for  a  nonlinear  system  is  an  iterative  process.  A  general  form  to  represent  both  linear  and  nonlinear 
systems  is  the  Kolmogorov-Garbor  polynomial  (Garbor,  et  al.,  1961). 

Neural  networks  with  supervised  learning,  such  as  backpropagation  networks,  have  been  investi¬ 
gated  for  solving  system  identification  problems  because  of  the  functional  mapping  capability  of  feed¬ 
forward  networks  (Hornik,  et  al.,  1989).  It  has  been  observed  through  experiments  with  the  modeling 
and  prediction  of  chaotic  time  varying  systems  that  a  feedback  mechanism  is  necessary  to  identify  a 
system  with  dynamic  response.  Details  of  this  argument  are  illustrated  in  research  described  in  this 
section.  In  other  respects,  the  unique  feature  of  neural  network-based  system  identification  method¬ 
ology  should  be  reiterated.  Instead  of  obtaining  explicit  expressions  about  the  system  functions  and 
corresponding  parameters,  a  neural  network  solves  a  system  identification  problem  through  training 
on  the  input-output  data  sets  observed.  The  underlying  function  and  parameters  of  a  system  identified 
by  a  neural  network  are  embodied  in  the  network  in  a  pattern  of  connection  weights  after  proper  train¬ 
ing.  In  the  following  paragraphs,  some  of  the  typical  approaches  to  system  identification  using  neural 
networks  are  described. 

3.4.1  Lapedes  and  Farber  (1987) 

This  paper  has  probably  become  the  classic  in  the  neural  network  approach  to  prediction  and 
system  modeling.  A  standard  three  layer  backpropagation  neural  network  with  one  hidden  layer  is 
used  to  predict  points  in  a  highly  chaotic  time  series  by  using  a  time  window  representation  scheme. 
Three  previous  points  on  the  time  coordinate  are  presented  to  the  input  layer  and  response  on  the  next 
time  station  is  used  as  output  in  the  output  layer  for  prediction.  For  this  problem,  performance  of  the 
neural  network  is  better  than  some  conventional  methods  (such  as  the  linear  predictive  method)  in  that 
the  former  gives  orders  of  magnitude  an  increase  in  accuracy.  An  interesting  experiment  was  also 
performed  to  study  the  underlining  approximation  capability  of  the  network  by  using  trigonometric 
sinusoid  instead  of  the  usual  sigmoid  function  for  the  transfer  function  so  that  a  generalized  Fourier 
approximation  resulted  from  the  network. 
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3.4.2  Tenorio  and  Lee  (1989) 


Tfenorio  and  Lee  designed  their  Self-Organizing  Neural  Network  (SONN)  algorithm  specifically 
for  system  identification  problems  by  approximating  function  and  functional  parameters  estimation 
in  a  unified  process.  The  algorithm  performs  a  search  on  the  model  space  by  the  construction  of  hyper- 
surfaces  so  the  identification  of  a  nonlinear  system  is  viewed  as  the  construction  of  an  N + 1  dimension 
hypersurface  when  the  system  is  represented  by  a  Kolmogorov-Garbor  polynomial  with  the  highest 
order  of  N.  The  architecture  of  the  network  evolves  while  training  takes  place.  The  SONN  consists  of 
three  processes:  1)  a  generating  rule  of  the  primitive  neuron  transfer  functions,  2)  an  evaluation  meth¬ 
od  which  measures  the  quality  of  the  model,  and  3)  a  structure  search  strategy  for  adjusting  the  archi¬ 
tecture  of  the  network.  When  tested  on  modeling,  the  chaotic  time  series  generated  from  the  Mackey- 
Glass  differential  equations,  SONN  gives  a  satisfactory  performance  compared  with  Lapedes  and 
Farber’s  work  (1987). 

3.4 3  Fernandez,  Parlos  and  Tsai  (1990) 

This  article  describes  a  recurrent  multi-layer  perceptron  network  and  its  use  in  the  identification 
of  nonlinear  dynamic  systems  based  on  input-output  measurements.  Feedback  connections  between 
layers  and  intralayer  recurrent  connections  are  introduced  in  the  network,  and  the  learning  algorithm 
is  a  modified  version  of  backpropagation  learning  rule.  Using  the  network  based  system  for  the  identi¬ 
fication  of  a  boiler  model  was  satisfactory,  yet  perceivable  training  errors  existed.  The  architecture  of 
the  neural  network  is  shown  in  Fig.  3.10. 


This  article  describes  a  Direct  Linear  Feedthrough  Structure  (DLF),  a  variation  on  the  back- 
propagation  network  by  adding  direct  connections  from  input  layer  to  output  layer  and  the  application 
of  this  architecture  on  the  process  identification  problem.  In  a  three  layer  backpropagation  network 
with  one  hidden  layer,  the  connections  from  input  layer  to  output  layer  represent  a  linear  system, 
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whereas  the  remaining  network  models  the  nonlinear  dynamics  in  the  system.  The  general  architec¬ 
ture  is  shown  in  Fig.  3.11. 

The  computational  properties  of  the  network  are  tested  on  the  process  identification  of  a  surge 
tank  that  has  a  stream  flowing  into  it  at  an  independently  determined  rate  and  flowing  out  of  the  tank 
at  a  rate  proportional  to  the  square  root  of  the  height  of  fluid  in  the  tank.  When  trained  on  the  whole 
set  of  data,  the  DLF  results  in  better  learning  accuracy  and  shorter  learning  time  than  the  standard 
backpropagation  network.  The  DLF  architecture  is  also  compared  with  a  standard  backpropagation 
network  on  the  generalization  capability  by  training  on  a  training  subset  and  testing  on  the  remaining 
set  of  data.  The  DLF  architecture  shows  better  accuracy  and  extrapolation  capability  on  the  testing 
cases. 


3.4.5  Hakim,  et  al.  (1990) 

Feedback  mechanism  is  a  necessity  in  building  a  neural  network  system  with  dynamic  behavior.  In 
nonlinear  signal  processing  and  time  series  prediction,  dynamic  recurrent  network  models  would  be 
better  suited  for  studying  chaotic  and  nonstationary  time  series.  A  new  neural  network  architecture  is 
proposed  in  this  article  to  solve  the  system  identification  problem  by  introducing  recurrent  mecha¬ 
nism  into  the  network  through  clustering  and  interconnections  between  nodes  among  different  clus¬ 
ters  in  the  hidden  layer.  Tvo  neuron  models,  the  classical  neuron  with  graded  response  used  by  Hop- 
field  and  Tank,  as  well  as  a  discrete-time  model,  are  studied  in  deta  'l.  The  parameters  used  for  solving 
a  system  identification  problem  consist  of  the  weight  matrix  of  the  middle  neuron  layer,  the  input  and 
output  connection  matrices,  the  size  and  topology  of  the  interconnection  neighborhood,  the  neurons’ 
time  constants,  and  the  shape  of  the  nonlinearities.  The  architecture  of  the  network  is  shown  in  Fig. 
3.12. 

The  performance  of  the  network  is  remarkable  on  two  benchmark  problems:  the  prediction  of  the 
logistic  function  with  chaotic  behavior  and  building  a  neural  network  Frequency  Shift  Keying  (FSK) 
demodulator.  Behaviors  of  the  time  parameters  and  the  number  of  neurons  in  the  middle  layer  are 
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Figure  3.12  -  Architecture  of  the  Neural  Network  for  System  Identification 


also  investigated.  One  advantage  of  this  architecture  is  that  it  can  handle  information  from  multi¬ 
channel  measurements  of  the  system. 

3.4.6  Nishimura  and  Arai  (1990) 

This  article  describes  a  structured  neural  network  and  its  application  to  power  system  state  evalu¬ 
ation  for  proper  power  system  control  and  operation  (Fig.  3.13).  The  voltage  measurement  at  different 
nodes  of  the  power  system  is  used  to  identify  patterns  of  system  performance.  The  new  architecture 
consists  of  an  input  layer,  a  receptive  layer,  an  associative  layer,  logic  units  and  feedback  mechanism 
as  shown  in  Fig.  3.14.  The  network  is  constructed  by  wiring  instead  of  learning,  and  tb“  modeling  and 
interpolation  capability  of  the  network  is  provided  by  the  selective  activation  coefficients  and  the  rela¬ 
tive  contribution  coefficients. 

The  composition  of  each  layer  in  the  structured  network  is  as  follows.  The  input  layer  serves  as  the 
usual  input  terminals;  the  receptive  layer  is  composed  of  several  receptive  strips  that  correspond  to 
subpatterns  in  the  input  data;  the  associative  layer  integrates  the  outputs  of  the  receptive  layer;  and  the 
logic  units  process  the  information  on  the  associative  layer  and  generate  the  final  output.  The  feed¬ 
back  mechanism  changes  the  selective  activation  coefficients  and  the  relative  contribution  coefficients 
according  to  some  outputs  of  logic  units,  which  also  provides  a  deeper  reasoning  capability  than  a 
backpropagation  network.  The  proposed  network  works  very  well  on  power  system  models. 

3.4.7  Narendra  and  Parthasarathy  (1990) 

This  article  describes  the  application  of  neural  networks  in  the  identification  and  control  of  non¬ 
linear  dynamical  systems.  Multilayer  backpropagation  neural  networks  and  recurrent  networks  are 
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treated  in  a  unified  fashion.  Schemes  for  static  and  dynamic  adjustment  of  parameters  of  neural  net¬ 
works  are  also  proposed.  Processes  for  direct  adaptive  control  and  indirect  adaptive  control  of  non¬ 
linear  systems  using  neural  networks  are  presented  with  satisfactory  simulation  results. 

Our  coverage  on  the  application  of  neural  networks  to  system  identification  only  provides  a 
glimpse  of  this  field  due  to  time  and  space  limitation.  It  is  also  of  interest  to  notice  that  system  identifi¬ 
cation  via  neural  networks  has  not  been  seriously  addressed  for  structural  systems  in  civil  engineering 
except  by  the  short  remarks  made  by  Rehak,  et  al.  (1989).  Research  will  be  excellent  in  this  area  and 
important  in  providing  robust  tools  and  insight  for  structural  system  analysis. 
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3.5  Forecasting  and  Prediction 

From  the  history  of  development  in  scientific  research,  it  could  not  be  over-emphasized  that  one  of 
the  central  problems  of  science  is  to  discover  the  underlying  laws  of  the  universe  so  that  not  only  a 
natural  phenomenon  can  be  understood  but  also  its  future  can  be  forecasted  or  predicted.  It  is  also 
interesting  to  realize  that  the  ability  to  give  proper  prediction  bears  a  close  tie  with  the  quality  of  life,  as 
are  exemplified  in  weather  forecasting  and  economic  planning. 

As  pointed  out  by  Weigent,  et  al.  (1990),  two  types  of  knowledge  are  required  in  order  to  forecast 
the  behavior  of  a  natural  system,  namely,  knowledge  of  the  underlying  laws  and  discovery  of  strong 
empirical  regularities  in  observations  of  the  system.  Therefore,  for  the  former,  the  prediction  problem 
functions  like  an  initial  value  problem  which  is  fully  determined  by  the  differential  equation  and  its 
initial  conditions;  for  the  latter,  the  behavior  of  the  system  is  represented  by  its  periodicity  whether  the 
periodicity  is  apparent  or  masked  by  noises.  In  other  words,  the  system  should  be  properly  identified 
first.  It  is  thus  natural  that  system  identification  is  the  overture  in  producing  the  outcome  of  forecast¬ 
ing.  Traditional  prediction  uses  statistical  methods  such  as  curve  fitting  and  regression  analysis.  Neu¬ 
ral  networks,  due  to  their  learning  capability  and  intrinsic  statistical  characteristics,  provide  a  poten¬ 
tial  tool  and  modeling  methodology  in  this  vital  area  of  scientific  endeavor. 

Real  world  prediction  problems  range  from  bond  rating  (Dutta  and  Shekhar,  1988)  and  power 
system  load  forecasting  (Atlas,  et  al.,  1990)  to  sunspot  activity,  have  been  investigated  by  researchers 
from  different  disciplines.  Due  to  their  intrinsic  properties,  forecasting  for  problem  domains  with 
underlying  principles  or  having  well  defined  models  is  easier  to  model  than  that  for  nonconservative 
domains  in  which  no  well  defined  models  exist.  In  the  following  paragraphs,  some  of  the  studies  in  this 
direction  are  described. 

3.5.1  Fcrmer  and  Sidorowich  (1987) 

Though  this  article  does  not  address  the  application  of  a  neural  network  to  prediction,  it  does 
influence  the  thinking  and  benchmark  construction  on  neural  network  based  prediction  models.  The 
article  describes  a  forecasting  technique  specifically  designed  for  chaotic  data  by  embedding  a  time 
series  in  a  state  space  using  delay  coordinates  and  modeling  the  nonlinear  mapping  using  local  approx¬ 
imation.  The  local  approximation  approach  performs  significantly  better  than  the  global  approxima¬ 
tion  method  introduced  by  Gabor,  et  al.  (1960)  and  autoregressive  models,  on  modeling  the  logistic 
map,  the  Mackey-Glass  delay-differential  equation,  Thylor-Couette  flow,  and  Rayleigh-Benard  con¬ 
vection. 

3.5.2  Dutta  and  Shekhar  (1988) 

This  article  describes  the  application  of  neural  networks  to  the  prediction  of  the  ratings  of  corpo¬ 
rate  bonds,  which  belongs  to  the  nonconservative  problem  domains  where  a  domain  model  or  theory  is 
not  well  defined.  For  this  problem,  conventional  mathematical  modeling  techniques  such  as  statistical 
regression  models  have  yielded  poor  results  and  it  is  difficult  to  build  rule-based  expert  systems. 
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Feedforward  networks  with  two  and  three  layers  were  used  ir.  this  study.  Ten  variables  were  se¬ 
lected  as  input  units  and  the  output  unit  represents  the  rating  of  the  bond.  Bond  ratings  and  values  of 
the  financial  variables  for  a  set  of  industrial  bonds  issued  by  47  companies  were  chosen  at  random  as 
the  data  set  and  30  of  them  were  used  for  training  and  the  remaining  data  for  testing.  The  neural 
network-based  models  were  compared  with  standard  regression  models  on  the  accuracy  of  classifica¬ 
tion  on  each  bond.  To  determine  the  minimal  set  of  influential  variables,  six  financial  variables  were 
also  used  for  training  and  testing.  The  reported  results  indicate  that  neural  network-based  models 
outperformed  the  regression  model  by  a  considerable  margin  on  both  learned  (92.3  vs.  61.5  percent) 
and  testing  cases  (83.3  vs.  50  percent),  and  the  three  layer  network  gives  best  learning  results.  On  the 
testing  cases,  the  performance  of  two  layer  network  is  the  same  as  the  three  layer  network  (83.3  per¬ 
cent).  The  reason  is  that  the  selected  ten  input  financial  features  of  a  bond  are  relatively  high  level 
abstractions.  On  the  other  hand,  the  learning  results  with  ten  input  variables  give  better  performance 
than  that  with  six  input  variables.  One  interesting  resuit  is  that,  on  misclassified  cases,  the  network 
prediction  has  at  most  one  grade  difference,  whereas  the  regression  model  is  often  off  by  several  rat¬ 
ings. 

The  success  of  the  neural  network-based  models  rests  on  the  in-depth  understanding  of  the  prob¬ 
lem  and  correct  judgement  on  selecting  the  representing  financial  variables  by  the  modelers.  This  in 
turn  indicates  the  importance  of  the  derivation  of  a  good  representation  scheme. 

3.5.3  Fozzard,  Bradshaw,  and  Ceci  (1989) 


This  article  describes  an  application  of  a  backpropagation  network  for  daily  solar  flare  forecasting 
and  comparison  of  the  network  prediction  with  a  rule-based  «  mert  system.  An  interesting  feature  of 
the  approach  is  that  it  uses  the  identical  representation  scheme  a*  used  for  the  rule-based  system.  The 
architecture  of  the  network  is  shown  in  Fig.  3.15.  The  3  output  units  represent  the  3  classes  of  solar 
flares  to  be  forecasted,  and  the  17  input  units  provide  a  distributed  coding  of  the  10  categories  of  input 
data  that  are  used  for  the  expert  system.  The  network  is  trained  and  tested  with  data  from  the  database 
of  the  expert  system,  and  the  performance  of  the  network  is  at  least  as  good  as  U.r  expert  system. 

It  should  be  pointed  out  that  the  netwoil  was  only  tested  on  a  small  segment  of  the  11-year  solar 
cycle  and  no  other  representation  scheme  was  investigated  for  the  generalization  analysis.  However, 
this  preliminary  study  does  show  the  promise  of  neural  networks  in  the  field  of  forecasting  where  no 
underlying  physical  principle  seems  apparent  at  the  moment. 

3.5.4  Sharda  and  Patil  (1990) 


This  article  reports  a  comparative  study  of  the  forecasting  capability  of  neural  networks  with  con¬ 
ventional  models  based  on  Box-Jenkins  Methods.  A  backpropagation  network  with  similar  architec¬ 
ture  to  NETtalk  (Sejnowski  and  Rosenberg,  1986)  is  used  to  model  the  time  series  via  time  windows. 
The  simulation  results  showed  that  both  neural  network  models  and  conventional  models  performed 
equally  well  on  the  simulation  problem  so  that  neural  networks  can  actually  be  used  as  a  forecasting 
tool.  The  advantage  of  using  neural  networks  is  that  it  is  a  veiy  simple  model  and  is  easy  to  build. 
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3.5.5  Walter,  Ritter  and  Schulten  (1990) 

This  article  describes  a  new  approach  to  the  prediction  of  non-linear  time  sequence  data,  namely, 
the  prediction  of  3-D  motion  of  an  object  in  a  set  of  nonlinear  potentials  of  different  orders  of  nonlin¬ 
earity.  The  main  idea  is  to  use  a  Kohonen  network  to  adaptively  discretize  the  set  of  the  input  data,  and 
to  estimate  in  each  discretization  cell  a  separate  set  of  linear  prediction  coefficients.  A  lattice  with 
12x12  neurons  is  used  in  this  study.  The  network  is  trained  with  a  series  of  trajectories  with  randomly 
chosen  starting  values  and  reasonable  performance  is  obtained. 

The  unique  feature  of  this  approach  is  the  use  of  a  Kohonen  network  instead  of  the  usual  feedfor¬ 
ward  type  network.  It  would  be  of  interest  in  the  field  of  prediction  if  an  integrated  system  could  be 
built  with  features  from  feedforward  networks  and  self-organizing  networks. 

3.5.6  Weigend,  Huberman  and  Rumelhart  (1990) 


This  report  describes  the  extension  of  feedforward  networks  utilized  by  Lapedes  and  Farber  (1987) 
to  predict  future  values  of  possibly  noisy  time  series  by  extracting  knowledge  from  the  past.  A  three 
layer  backpropagation  network  with  one  hidden  layer  (shown  in  Fig.  3.16)  and  its  variation  through  a 
weight-elimination  scheme  and  a  time  window  representation  scheme  are  used  in  modeling  the  behav¬ 
ior  of  the  time  varying  system.  The  weight-elimination  scheme  is  derived  from  a  more  complex  cost 
function  than  the  usual  squared  error  cost  function  in  that  a  cost  measurement  associated  with  each 
connection  weight  in  the  network  is  included.  The  issues  of  over-fitting  and  generalization  capability 
while  training  a  large  network,  and  sigmoidal  transfer  function  vs.  radial  basis  functions,  are  analyzed 
numerically  in  the  system  training  process.  It  was  found  that  a  densely  trained  network  via  weight-eli¬ 
mination  scheme  illustrated  better  generalization  capability  tnan  sparse  networks. 

Tvo  benchmark  problems  in  statistics  were  tested  on  the  forecasting  capability  of  neural  networks, 
namely,  the  sunspot  activity  prediction  and  the  modeling  of  computational  ecosystems.  For  the  sun- 
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spot  problem,  the  neural  network  model  outperformed  the  threshold  autoregressive  model  by  Tbng 
and  Lim  (1980),  which  is  considered  the  best  statistical  model  to  date.  For  the  prediction  in  a  computa¬ 
tional  ecosystem,  a  time  series  for  the  use  of  resources  is  trained  with  a  three  layer  backpropagation 
network  with  one  hidden  layer.  Single  and  multi-step  predictions  of  the  ecosystem  show  good  agree¬ 
ment  between  data  and  forecasting  values.  The  eventual  forecasting  function  after  several  million  iter¬ 
ations  exhibited  a  very  similar  frequency  spectrum  to  the  original  data. 

3.5.7  Werbos  (1988) 

It  is  perhaps  fair  to  say  that  Werbos  is  one  of  the  first  few  who  applied  neural  networks  to  real  world 
problems.  This  article  describes  a  generalization  of  dynamic  feedback  in  the  backpropagation  net¬ 
work  to  deal  with  recursive  time-dependent  networks  and  to  use  it  in  prediction,  optimization  over 
time  and  the  analysis  of  the  properties  of  a  natural  gas  market  model  which  has  been  used  in  a  major 
study  of  natural  gas  deregulation. 

3.5.8  White  (1988) 

This  article  describes  the  application  of  neural  networks  to  an  enticing  field  -  stock  market  predic¬ 
tion.  The  objective  of  the  work  is  to  determine  whether  a  neural  network  can  decode  previously  unde¬ 
tected  regularities  in  asset  price  mov:ir*._  fits,  such  as  the  daily  fluctuations  of  common  stock  price, 
using  the  case  of  IBM  daily  common  stock  returns  as  an  example.  A  sample  of  1000  days  of  data  were 
selected  out  of  the  available  5000  da,  a  of  return  data  as  a  training  set,  together  with  samples  of 500  days 
before  and  after  the  tiaining  period  as  testing  cases.  A  three  layer  backpropagation  network  with  one 
hidden  layer  is  used  in  this  study.  Though  results  from  this  endeavor  were  not  satisfactory,  some  valu¬ 
able  insights  are  worth  mentioning:  1)  modeling  this  highly  nonconservative  problem  system  is  not  easy 
with  simple  networks,  2)  the  simple  network  has  the  tendency  to  over-fit  the  price  time  series,  and  3)  the 
simple  network  is  capable  of  extremely  rich  dynamic  behavior. 
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3.6  Control 

In  the  control  of  a  dynamic  system,  there  usually  exist  two  processes,  namely,  the  learning  or  identi¬ 
fication  of  the  system  and  the  extraction  and  enforcement  of  control  signals.  Control  theory  is  a  very 

well  defined  domain  with  much  literature,  and  many  learning  methods  used  in  neural  network  learning 

are  closely  related  to  methods  that  have  been  intensively  studied  in  adaptive  control  theory.  On  the 


-  98  - 


other  hand,  because  of  the  way  a  neural  network  operates  and  performs,  the  neural  network  based 
control  has  special  characteristics  that  lack  in  traditional  control  theory.  It  has  been  illustrated  in 
previous  sections  that  neural  networks  are  powerful  tools  in  capturing  the  characteristics  of  a  system 
through  self-organization  or  learning.  Because  of  this  adaptive  parameter  estimation  capability,  neu¬ 
ral  « etworks  are  very  well  suited  for  application  in  engineering  control. 

Tb  date,  many  kinds  of  neural  networks  have  been  studied  for  control  applications.  Like  system 
identification,  systems  consisting  of  supervised  learning  models  have  found  the  most  extensive  usage 
in  control.  Backpropagation  learning  as  well  as  reinforcement  learning  are  considered  to  be  the  most 
suitable  learning  algorithm.  One  of  the  advantages  of  using  reinforcement  learning  is  that  the  learning 
process  can  be  accomplished  on-line.  On  the  other  hand,  reinforcement  learning  provides  flexibility  in 
extracting  control  signals  because  it  addresses  the  problem  of  improving  performance  as  evaluated  by 
any  measure  whose  values  can  be  supplied  to  the  learning  system  (Barto,  1989) 

The  application  of  neural  network-based  control  techniques  has  covered  a  broad  range.  Perhaps 
the  most  noteworthy  one  is  the  trucker  backer-upper  and  broom  balancer  problems  solved  by  Wind¬ 
row  and  Nguyen  (1987  and  1989).  The  majority  of  applications  of  neurocontrol  are  in  all  aspects  of 
robot  control  and  manufacturing  process  control.  The  following  list  of  references  gives  a  sketchy  pic¬ 
ture  of  applications  in  this  field. 
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3.7  Diagnostics  Systems 


Fault  diagnosis  of  a  system,  due  to  its  relation  with  pattern  i  ecognition,  is  probably  one  of  the  most 
suitable  area  for  the  application  of  neural  networks.  Many  real  world  problems  involve  the  use  of 
detection  such  as  the  diagnosis  of  disease  from  symptoms  or  the  diagnosis  of  engine  malfunction 
through  observation  of  engine  behavior.  The  detection  and  diagnosis  of  faults  can  usually  be  pro¬ 
cessed  in  two  steps:  the  recognition  of  abnormality  in  the  system  and  the  identification  of  causes  for  the 
abnormality  or  faults. 

There  are  many  approaches  to  the  diagnosis  of  a  system  in  general,  such  as  the  rule-based  system, 
simulation-based  expert  system  and  neural  network-based  system.  Though  rule-based  systems  have 
been  used  successfully  in  many  medical  and  engineering  applications,  the  process  of  encoding  knowl¬ 
edge  in  rules  and  the  knowledge  acquisition  process  are  complicated  and  not  easily  achieved  in  a  short 
time  frame.  For  systems  with  the  simulation  of  a  physical  system  in  the  knowledge  base,  the  simulation 
process  is  computationally  intensive  and  time  consuming.  Because  of  the  slow  response  time  pertain¬ 
ing  to  an  expert  system-based  approach  to  diagnosis,  real-time  application  of  those  approaches  would 
be  too  difficult  to  achieve.  On  the  other  hand,  neural  networks  such  as  backpropagation  networks, 
after  proper  training,  will  provide  nearly  real-time  response  for  a  diagnosis  system  in  real  world  appli¬ 
cations  and  the  system  can  be  developed  in  a  short  time. 

The  architecture  of  neural  network  diagnosis  systems  has  been  investigated  by  many  researchers 
on  different  problems,  and  it  can  be  of  different  forms  which  include  the  simple  single  backpropaga¬ 
tion  network,  a  hierarchy  of  different  backpropagation  networks  such  as  that  used  in  jet  engine  diagno¬ 
sis  (Dietz,  et  al.,  1990),  a  hybrid  system  combining  neural  networks  and  rule-based  systems  (Tsoukalas 
and  Reyes-Jimenez,  1990;  Saito  and  Nakano,  1988;  Schreinemakers  and  Touretzky,  1990),  and  a  system 
incorporating  supervised  learning  and  self-organizing  mechanisms,  depending  on  the  nature  and 
complexity  of  the  problem. 

The  application  of  neural  networks  to  diagnose  problems  ranges  from  medical  diagnosis  and  fault 
diagnosis  in  electrical  systems  to  the  maintenance  and  monitoring  of  chemical  processes.  In  medically 
related  diagnosis,  several  diseases  have  been  studied  such  as  the  diagnosis  of  epilepsy  through  symp¬ 
toms  in  a  computer  aided  medical  diagnosis  system  (Appolioni,  el  al.,  1990),  the  diagnosis  of  disease  of 
newborn  babies  through  analysis  of  radiology  images  (Boone,  et  al.,  1990),  and  the  diagnosis  of  low 
back  pain  (Bounds,  et  al.,  1988).  A  hybrid  expert  system  incorporating  a  neural  network  for  medical 
diagnosis  has  been  proposed  by  Saito  and  Nakano  (1988),  and  Schreinemakers  and  Touretzky  (1990) 
proposed  the  use  of  OPS5  functions  for  the  construction  of  a  hybrid  system  for  diagnosing  mastitis  in 
cows. 

Perhaps  the  most  noteworthy  application  is  the  neural  network-based  explosive  detection  system 
for  safety  checks  at  airports  (Shea,  Lin  and  Liu,  1989  and  1990).  Dietz,  et  al.  (1987, 1988  and  1989) 
constructs  a  real-time  diagnosis  system  for  failure  detection  in  the  bearings  and  the  fuel  system  of  a  jet 
engine,  and  also  a  space  shuttle  engine  system  based  on  the  test  data.  Casselman  and  Acres  (1990) 
developed  a  comprehensive  diagnosis  system  using  several  networks  for  the  monitoring  of  a  large  sat¬ 
ellite  communication  system.  Neural  networks  have  also  found  application  in  the  fault  diagnosis  of 
electronic  circuit  boards  (Kagle,  et  al.,  1990),  the  automatic  control  system  (Marko,  et  al.,  1990),  the 
operation  of  a  nuclear  plant  model  (Tsoukalas  and  Reyes-Jimenez,  1990),  and  in  transportation  engi- 
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neering  by  processing  radar  waves  for  detecting  the  presence  of  a  waterproofing  membrane  in  an  as¬ 
phalt  covered  bridge  deck  (Vrchovnik,  et  al.,  1990). 

In  the  following  paragraphs,  several  typical  applications  are  described  in  detail  and  relevant  refer¬ 
ences  are  also  provided. 

3.7.1  Medical  Applications 

Appolloni,  et  al.,  (1990)  describe  an  application  in  the  diagnosis  of  epilepsy,  a  group  of  neurologi¬ 
cal  disorders  characterized  by  the  recurrence  of  epileptic  seizures.  A  data  set  has  been  constructed  by 
collecting  comprehensively  all  the  clinical  and  laboratory  information  on  158  patients  presenting  epi¬ 
leptic  symptoms.  In  one  respect,  the  classification  criterion  on  the  disease  has  been  proposed  by  the 
Commission  on  Classification  and  Terminology  of  the  Internationa!  League  Against  Epilepsy.  The 
objective  of  the  research  is  to  compare  the  syndromic  classification  based  on  clinical  criteria  with  the 
categorization  achieved  with  a  backpropagation  network. 

A  three  layer  backpropagation  network  with  one  hidden  layer  is  used  in  this  study.  The  input  layer 
has  724  units  representing  a  list  of  questions  coded  in  bit  form,  and  the  output  has  31  nodes  represent¬ 
ing  the  31  possible  diagnoses.  Besides,  the  31  diagnoses  can  be  clustered  into  7  groups.  There  are 
totally  156  sets  of  data  in  which  134  sets  correspond  to  reliable  diagnoses  and  22  sets  to  uncertain  or 
fuzzy  diagnoses.  The  former  sets  are  used  for  training  and  the  latter  for  testing  the  generalization 
capability  of  the  trained  network.  Through  trial  and  error,  it  was  found  that  a  hidden  layer  with  50 
nodes  gave  the  optimal  results  both  in  training  and  testing. 

After  training  the  network,  the  previous  network  is  trimmed  of  connections  and  nodes  with  small 
connection  strengths,  and  finally  the  network  consists  of  74  input  units.  This  process  distills  the  data 
representation  scheme  to  an  efficient  form  so  that  it  gives  about  80  percent  of  valid  results  on  the  single 
diagnosis  and  95  percent  on  the  clusters  of  diagnosis.  Of  course,  a  more  powerful  approach  would  be 
the  use  of  a  neural  net-based  expert  system. 

Boone,  et  al.  (1990)  used  backpropagation  neural  networks  for  the  interpretation  of  radiological 
images  in  computer  aided  medical  diagnosis  of  certain  diseases.  There  are  two  processes  involved  in 
the  diagnosis  of  diseases  based  on  radiological  images:  the  abnormality  identification  of  the  images 
and  the  interpretation  of  the  abnormal  findings.  For  the  identification  of  abnormal  anatomical  struc¬ 
tures  appearing  on  the  radiographs,  one  hundred  25-pixel  images,  generated  with  Gaussian  noise  and 
with  signals  added  to  50  percent  of  them,  were  used  in  the  training  of  a  three  layer  network  to  indicate 
abnormality  in  the  image.  The  network  thus  has  25  nodes  in  the  input  layer,  5  nodes  in  the  hidden  layer, 
and  1  node  in  the  output  layer.  For  computer  aided  diagnosis,  another  backpropagation  neural  net¬ 
work  is  used  to  map  the  relationship  between  radiographic  findings  to  a  list  of  plausible  diagnoses.  It 
was  decided  that  there  were  about  50  possible  choices  for  radiographic  findings,  and  23  possible  diag¬ 
noses  to  the  newborn  chest  radiographs.  The  training  and  testing  results  showed  close  consistency 
with  the  diagnoses  from  doctors  (79  percent  of  positive  diagnosis  and  99  of  negative  diagnosis). 

3.7.2  Communication  Systems 

Casselman  and  Acres  described  a  large  system  called  DASA/LARS  with  extensive  use  of  neural 
networks  for  the  diagnosis  on  the  operation  and  maintenance  of  satellite  communication  networks. 
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On-line  sensor  information  in  an  operational  environment  is  used  for  training,  and  the  resulting  neural 
network-based  system  has  already  been  integrated  into  the  working  environment.  The  diagnosis  is 
based  on  two  kinds  of  inputs:  data  obtained  from  a  swept-frequency  spectrum  analyzer  and  database 
information  obtained  from  another  subsystem  in  the  Defense  Satellite  Communications  Systems 
(DSCS).  A  fault  is  diagnosed  through  comparison  of  the  observed  spectral  data  with  the  planned 
parameters  for  each  carrier  stored  in  the  DOSS  database. 

The  backpropagation  network  is  used  in  the  system  and  9  different  architectures  are  constructed 
to  diagnose  a  total  of  13  different  problems  including  the  transponder  saturation,  data  format  prob¬ 
lems  such  as  wrong  modulation,  incorrect  coding  and  transmitter  filter  malfunction,  and  the  failure  of 
an  earth  station’s  autotrack  feature,  etc.  It  is  interesting  that  four  layer  backpropagation  networks 
with  two  hidden  layers  are  utilized  for  all  the  subsystems.  The  system  has  been  tested  on-line  and  has 
worked  remarkably  well.  An  updated  system  has  been  installed  at  the  satellite  operations  center  of 
DSCS  at  Fort  Detrick,  MD  since  1989. 

3.7.3  Mechanical  Systems 

Dietz,  et  al.  (1987, 1988,  and  1989)  describe  the  fault  diagnosis  of  jet  and  rocket  engines  by  using 
neural  networks  to  construct  the  mapping  from  patterns  of  sensor  data  to  a  pattern  associated  with  a 
particular  fault  condition.  Three  layer  uackpropagation  networks  with  one  hidden  layer  are  used  in 
this  study.  A  jet  engine  diagnostic  system  is  built  to  identify  the  difference  between  behavior  exhibited 
by  bearing  failures  and  that  by  fuel  interruptions  directly  from  sensor  data.  The  architecture  of  the 
system  is  hierarchical  and  consists  of  five  networks  for  each  kind  of  sensor  data.  A  higher  level  net¬ 
work  is  directly  used  to  process  the  sensor  data  and  recognize  the  fault  type,  and  the  two  lower  level 
networks  are  then  trained  to  identify  the  severity  and  duration  of  the  fault.  The  architecture  for  the 
process  of  data  from  a  sensor  is  shown  in  Fig.  3.17. 

For  the  current  prototypic  system,  four  sensors  are  employed  to  measure  the  combustion  tempera¬ 
ture,  exhaust  gas  temperature,  low  pressure  turbine  rotational  speed,  and  high  pressure  turbine  rota¬ 
tional  speed.  Hence  the  system  is  totally  composed  of  20  networks  and  the  design  of  the  input  layer  is 
based  on  the  sensor  data  acquired  in  a  4.0  second  time  interval.  The  training  data  are  generated  from 
an  engine  simulation  program  called  ATEST.  The  testing  of  the  trained  network  is  performed  by  using 
crisp  data  and  data  generated  with  a  certain  percentage  of  noises.  The  training  and  testing  of  the 
system  resulted  in  satisfying  performance  in  jet  engine  diagnosis. 

The  article  also  describes  another  diagnosis  system  for  the  rocket  engine  used  in  the  space  shuttle, 
using  experimental  test  data  because  of  the  lack  of  theoretical  models.  This  later  approach  would  have 
a  more  general  implication  for  real-time  diagnostics  applications  of  neural  networks. 


3.7.4  Explosive  Detection 

Shea,  et  al.  (1989  and  1990)  describe  the  construction  of  a  neural  network-based  system  for  explo¬ 
sive  detection  at  airport  check-ins.  The  application  is  probably  one  of  the  most  successful  cases  and 
has  generated  lots  of  public  interest  in  research  and  development  of  neural  network  technology.  The 


-  106  - 


BEARING 


^JSenso^ 

Fault  Type  Recognition 
(Neural  Network) 


Severity 
Determination 
(Neural  Network) 


Duration 
Determination 
(Neural  Network) 


FUEL  SYSTEM 


Severity 
Determination 
(Neural  Network)) 


Duration 
Determination 
(Neural  Network) 


Figure  3.17  -  Architecture  of  A  Subsystem  in  the  Jet  Engine  Diagnosis  System 


detection  of  explosives  is  based  on  the  presence  of  nitrogen  in  the  luggage  using  thermal  neutron  acti¬ 
vation.  A  three  layer  backpropagation  network  with  one  hidden  layer  is  used  for  training  and  testing 
the  network  from  measurements  gathered  at  the  airport.  The  performance  of  the  system  is  compared 
with  a  conventional  system  using  standard  statistical  technique.  The  key  parameters  that  measure  the 
performance  of  the  systems  are  the  probability  of  detection  (PD)  for  the  minimum  amount  of  explosive 
in  a  threat  and  the  probability  of  false  alarm  (PFA)  on  bags  without  explosives.  The  neural  network 
based  system  has  been  installed  in  several  airports  in  parallel  with  the  conventional  system  for  a  cer¬ 
tain  time.  Through  real  world  testing,  it  has  been  found  that  both  systems  perform  equally  well  in 
terms  of  detection  rate,  and  for  false  alarm  rate  the  neural  network  based  system  is  considerably  bet¬ 
ter.  Different  learning  models  such  as  the  counterpropagation  network  as  well  as  the  four  layer  back- 
propagation  networks  have  also  been  investigated,  and  performance  indicates  that  the  three  layer 
backpropagation  network  gives  the  best  results. 
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3.8  Planning,  Scheduling,  and  Optimization 

Planning  and  scheduling  problems  are  highly  constrained  optimization  problems  or  combinato¬ 
rial  optimization  problem  that  is  known  to  be  NP-hard.  The  well  known  Traveling  Salesman  Problem 
is  a  good  example  involving  path  planning.  The  current  approach  to  such  problems  is  difficult  if  not 
impossible  to  find  with  an  optimal  or  nearly-optimal  solution.  Usually,  an  admissible  solution  is  good 
enough  for  acceptance.  Since  its  advent,  the  Hopfield  network  has  been  considered  by  many  research¬ 
ers  to  obtain  an  optimal  or  nearly-optimal  solution  for  this  kind  of  NP-complete  problem,  and  exten¬ 
sive  research  has  been  carried  out  on  the  solution  to  the  Traveling  Salesman  Problem.  Many  real  world 
problems,  such  as  job-shop  scheduling  in  mechanical  engineering,  crew  scheduling  in  food  service 
industry,  and  material  handling,  are  combinatorial  optimization  problems  and  can  be  transformed 
into  the  frame  of  a  Traveling  Salesman  Problem.  Because  of  this,  use  of  the  Hopfield  style  network  for 
planning  and  scheduling  becomes  feasible. 

Though  standard,  Hopfield  and  Tank’s  network  can  be  used  to  solve  certain  small  optimization 
problems  with  constraints,  the  general  use  of  this  approach  is  impeded  by  its  tendency  to  converge  to 
the  local  minimum  and  its  poor  scaling  properties  for  large  problems.  To  overcome  this  difficulty, 
various  modifications  on  the  Hopfield  and  Tank  network  have  been  proposed.  Some  of  the  well  known 
schemes  are  the  Integer  Linear  Programming  Neural  Networks  (Foo  and  Takefuji,  1988),  the  elastic  net 
(Durbin  and  Willshaw,  1987),  Supplier-Consumer  Net  (Parunak,  et  a!.,  1987),  the  Primal-Dual  net- 
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work  (Culioli,  et  al.,  1990)  and  many  others.  After  transforming  a  shared  resource  scheduling  problem 
into  an  unshared  resource  problem,  Bourret  and  Goodall  (1989)  proposed  the  competitive  activation 
networks. 

Tb  date,  different  optimization  problems  encountered  in  various  fields  of  disciplines  have  been 
solved  with  neural  networks.  Poliac,  et  al.  (1987)  used  a  novel  representation  scheme  to  solve  the  crew 
scheduling  problem;  Parunak,  et  al.  (1987)  proposed  a  supplier-consumer  network  to  solve  material 
handling  p*  oblems;  the  Airline  Marketing  Tactician  (Hutchison  and  Stephens,  1987)  may  be  the  first 
commercial  application.  For  job-shop  scheduling  problem,  Foo  and  Takefuji  (1988),  Chen  (1990),  and 
Zhou,  et  al.  (1990),  have  investigated  different  approaches  and  obtained  satisfactory  results.  Other 
applications  include  the  time-table  scheduling  (Yu,  1990),  object  avoidance  touring  planning  (Wong 
and  Funka-Lea,  1990),  linear  programming  (Culioli,  et  al.,  1990;  Kalaba  and  Moore,  1990),  path  opti¬ 
mization  (Hassoun  and  Sanghvi,  1990),  and  the  scheduling  of  satellite  broadcasting  times  (Bourret,  et 
al,  1990). 

In  the  following  paragraphs,  seme  of  the  approaches  to  the  scheduling  and  planning  problems  are 
described  in  detail  to  highlight  the  main  features.  There  are  also  many  publications  on  the  theoretical 
analysis  of  optimization  oriented  neural  networks  (Maa  and  Shanblatt,  1990;  Hellstrom  and  Kanal, 
1990;  Barbosa  and  de  Carvalho,  1990).  Due  to  time  and  space  limitations,  only  some  of  the  literature 
on  that  subject  is  included  in  the  reference  list. 


3.8.1  Satellite  Antennae  Scheduling  -  Bourret  and  Goodall  (1989) 


The  results  reported  by  Bourret  and  Goodall  (1989)  tire  unique  since  it  proposes  and  proves  a 
theorem  that  transforms  a  shared  resource  scheduling  problem  into  an  unshared  resource  problem, 
and  it  introduces  a  competitive  activation  based  neural  network  to  solve  the  unshared  resource  sched¬ 
uling  problem.  The  proposed  approach  is  tested  on  the  optimal  scheduling  of  antennae  for  low  level 
satellites.  The  detailed  antennae  scheduling  problem  is,  given  the  required  broadcasting  time,  the 
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priority  level  of  each  satellite,  and  the  time  intervals  within  which  the  satellites  are  in  sight  of  the  vari¬ 
ous  antennae,  to  optimize  the  total  broadcasting  time  weighted  by  the  priority  of  satellites. 

The  architecture  of  the  competitive  activation  based  neural  network  consists  of  three  layers  which 
are  designated  as  layer  R,  layer  C,  and  layer  T,  as  is  shown  in  Fig.  3.18.  Each  unit  in  layer  R  represents  a 
time  slice  assigned  to  a  satellite  among  those  competing  for  that  time  slice,  and  the  unit  uses  the  com¬ 
petitive  activation  rules.  Layer  C  consists  of  units  that  always  keep  the  same  activation  level  and  repre¬ 
sent  each  possible  time  slice.  At  the  beginning  of  the  computation,  the  strength  of  links  between  layer 
C  and  layer  R  is  the  given  priority  of  the  satellite.  The  last  layer,  Layer  'll  is  composed  of  a  number  of 
units  that  correspond  to  the  number  of  satellites  in  the  system  and  the  activation  level  of  each  unit 
represents  how  much  broadcasting  time  has  been  scheduled  for  each  satellite.  A  special  competitive 
learning  rule  similar  to  the  standard  winner-takes-all  so 'erne  is  proposed  to  determine  the  winner  in 
layer  R  as  well  the  modification  of  connection  strengths  or  weights. 

In  another  article,  Bourret,  et  al.  (1990)  identifies  the  drawbacks  encountered  in  the  competitive 
activation  based  network  for  resource  scheduling  problem  and  presents  a  new  implementation  with  a 
modified  activation  rule.  The  modified  algorithm  and  architecture  has  the  following  features:  1)  a  new 
competitive  output  function  is  introduced  to  distribute  the  activation  in  layer  C  among  competitors  in 
layer  R,  2)  links  are  created  between  nodes  in  layer  R  to  have  competition  with  each  other,  3)  the  decay 
factor  is  added  to  the  activation  rule  for  nodes  in  layer  R,  and  4)  the  activation  update  rule  for  nodes  in 
layer  R  is  modified  to  include  a  certain  amount  of  noises.  The  modified  scheme  overcomes  the  short¬ 
comings  of  the  previous  system  and  gives  more  robust  results  in  different  simulation  problems  on  re¬ 
source  scheduling. 

3.8.2  Robot  Assembly  Seqi  )ce  Planning  -  Chen  (1990) 

Chen  (1990)  describes  the  theory  and  application  of  a  Hopfield  network  to  the  solution  of  an  as¬ 
sembly  sequence  problem  which  is  also  an  AND/OR  precedence-constrained  traveling  salesman 
problem.  This  problem  involves  the  generation  of  all  the  possible  assembly  sequences  and  the  deter¬ 
mination  of  the  most  promising  one.  A  modified  Hopfield  network  is  used  to  solve  the  planning  prob¬ 
lem.  At  first,  the  geometric  constraints  among  parts  are  mapped  into  the  connection  weight  matrix. 
The  optimum  assembly  sequence  can  then  be  found  through  the  ability  of  neurons’  continuous  dynam¬ 
ics  adaptation  to  reach  a  lower  energy  measurement  of  the  system. 

In  using  Hopfield  and  Tank’s  approach  to  this  combinatorial  problem,  binary  threshold  neurons 
and  symmetric  connection  weight  matrix  are  enforced.  The  precedence  constraints  can  be  mapped  to 
the  connection  weight  among  neurons  through  giving  the  connection  a  positive  or  negative  real  value  or 
using  the  property  of  biases  to  set  the  general  level  of  excitability  of  the  network  such  that  the  change  in 
input-output  relation  at  each  neuron  will  result  in  a  change  of  the  activation  level  of  the  system.  The 
optimum  sequence  with  AND  precedence  relationship  is  solved  using  traditional  second-order  Hop- 
field  network.  A  higher-order  Hopfield  network  is  also  designed  to  solve  the  OR  precedence  relation¬ 
ship  assembly  sequences  problem. 


-  Ill  - 


3.8.3  Linear  Programming  -  Culioli,  et  al.  (1990) 


Culioli,  et  al.  (1990)  introduces  a  new  Primal-Dual  network  to  solve  the  general  linear  program¬ 
ming  problem.  The  classical  neural  network  approach  to  the  linear  programming  problem  is  Hopfield 
and  Tank’s  approach,  and  the  solution  usually  converges  to  stable  states.  However,  Hopfield  and 
Tknk’s  approach  does  not,  in  general,  yield  an  optimal  solution  and  it  has  poor  scaling  property.  This 
shortcoming  may  come  from  the  penalization  treatment  of  the  constraints.  In  the  approach  proposed, 
the  constraints  are  treated  with  Lagrangian  multipliers  that  converge  to  primal  and  dual  admissible 
solution.  It  shows  that  the  Primal-Dual  network  converges  to  admissible  solutions,  and  can  be  used  to 
get  a  very  good  approximation  of  the  optimal  cost. 

3.8.4  Job-Shop  Scheduling  -  Foo  and  Takefuji  (1989) 

Job-shop  scheduling  is  a  resource  allocation  problem  involved  with  machines  and  the  task  jobs. 
Each  job  may  also  consist  of  several  subjobs  subject  to  precedence  constraints.  With  this  scheduling 
problem,  it  is  very  hard  to  obtain  an  optimal  solution  due  to  the  large  number  of  constraints.  The 
Hopfield  network  and  the  Integer  Linear  Programming  Neural  Networks  have  been  investigated  for 
the  job-shop  scheduling  problem,  and  it  was  observed  that  the  two  networks  are  not  suitable  for  hard¬ 
ware  implementation  due  to  their  poor  scaling  properties  (Foo  and  Takefuji,  1988). 

Foo  and  Takefuji  (1988)  are  probably  the  first  using  the  Hopfield  type  network  to  solve  the  job- 
shop  scheduling  problem.  The  approach  proposed  has  a  general  use  for  all  the  NP-complete  optimi¬ 
zation  problems  with  constraints.  At  first,  the  job-shop  problem  is  mapped  to  a  2-D  matrix  represen¬ 
tation  of  neurons  similar  to  those  for  solving  the  traveling  salesman  problem.  The  constraints  on  op¬ 
erational  precedence  are  imbedded  in  the  network  through  application  of  constant  positive  and  nega¬ 
tive  current  biases  to  specific  nodes.  The  solution  of  a  job-shop  problem  is  encoded  in  a  set  of  cost 
function  trees  in  the  matrix  of  stable  states.  Each  node  in  the  set  of  trees  represents  a  job,  and  each  link 
represents  the  interdependency  between  jobs.  The  cost  attached  to  each  link  is  a  function  of  the  pro¬ 
cessing  time  of  a  particular  job.  The  starting  time  of  each  job  can  be  determined  by  traversing  the 
parts  leading  to  the  root  node  of  the  tree.  A  computation  circuit  is  used  to  compute  the  total  comple¬ 
tion  times  of  all  jobs,  and  the  cost  difference  is  added  to  the  energy  function  of  the  Hopfield  network. 
To  reach  the  optimal  solution,  Simulated  Annealing  is  used  to  help  the  system  escape  from  local  mini¬ 
mum.  The  use  of  an  annealing  algorithm  is  the  most  salient  feature  of  the  proposed  approach. 

The  drawback  associated  with  the  use  of  an  annealing  algorithm  is  that  it  is  computationally  ex¬ 
pensive  because  the  procedure  may  take  an  infinite  amount  of  time  to  find  an  optimal  solution  if  the 
size  of  the  problem  grows  larger  and  larger.  This  is  also  the  criticism  of  using  the  Hopfield  network  for 
combinatorial  optimization  problems  in  general.  The  performance  of  the  algorithm  is  tested  by  solv¬ 
ing  several  job-shop  scheduling  problems  with  various  degrees  of  complexity. 

Foo  and  Takefuji  (1988)  also  proposed  an  integer  linear  programming  neural  network  based  on  a 
modified  Tank  and  Hopfield  neural  network  model  by  using  linear  measurement  of  the  cost  function  to 
solve  the  job-shop  scheduling  problem.  The  cost  function  for  minimization  is  the  total  starting  times 
of  all  jobs  subject  to  precedence  constraints.  The  set  of  integer  linear  equations  is  solved  by  an  itera¬ 
tive  linear  programming  with  integer  adjustments  technique,  and  the  linear  and  nonlinear  zero-one 
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variables  are  represented  by  linear  sigmoid  and  nonlinear  high-gain  amplifiers  with  a  response  of  a 
step  function.  The  approach  shows  some  improvement  over  the  Hopfield  network  with  simulated  an¬ 
nealing. 

Recently,  Zhou,  et  al.  (1990)  introduced  a  novel  approach  to  the  job-shop  scheduling  problem  by 
using  a  modified  version  of  the  Linear  Programming  Network  described  by  Tknk  and  Hopfield  (1986). 
The  proposed  model  uses  a  linear  cost  function  instead  of  the  quadratic  energy  function  in  Hopfield 
network.  The  important  feature  of  the  model  is  the  incorporation  of  a  product  term  into  the  energy 
constant  function  instead  of  using  too  many  control  variables  to  resolve  conflicts  of  operation  on  the 
same  machine.  Ir.  a  sense,  it  in  fact  implements  the  multiplication  operation  as  an  addition  operation 
so  that  the  resulting  network  has  good  scaling  capability.  On  the  other  hand,  in  solving  a  simulation 
problem  with  4/3  job-shop  scheduling,  the  proposed  network  only  uses  a  small  fraction  of  neurons  and 
connections  of  the  regular  Hopfield  network  or  Integer  Linear  Programming  Neural  Networks  (Foo 
and  Thkefuji,  1988). 


3.8.5  Path  Optimization  -  Hassoun  and  Sanghvi  (1990) 

Hassoun  and  Sanghvi  propose  a  new  neural  network  architecture  for  the  path  optimization  prob¬ 
lem  in  which  a  shortest  path  between  two  points  in  two  or  higher  dimensions  is  sought.  The  architec¬ 
ture  of  the  network  is  of  multilayer  modular  form  and  the  basic  network  consists  of  a  locally-inteicon- 
nected  stage  of  simple  neural  subnets  called  comparators  which  perform  node  potential  computations 
for  a  search  map  with  one  grounded  node  A.  After  certain  computations  are  done  at  all  nodes  and  the 
resulting  collective  computation  leads  to  a  stable  potential  surface  having  zero  potential  at  the  ground 
node  A,  an  identical  network  is  used  to  compute  the  second  potential  surface  having  another  ground 
node  B  as  the  zero  potential  node.  The  nodes  A  and  B  are  then  assumed  to  be  the  end  points  of  the 
optimal  path.  Next,  the  corresponding  node  potential  pairs  at  A  and  B  are  added  for  nodes  in  the  grid 
separately  and  each  sum  is  compared  to  the  minimum  potential  of  the  network  using  a  final  layer  of 
input  threshold  neurons.  The  output  of  the  final  layer  spatially  encodes  the  optimal  path  between 
points  A  and  B.  The  computation  time  of  the  network  is  determined  by  the  speed  at  which  the  poten¬ 
tial  wave  front  spreads  away  from  the  ground  node.  In  general,  the  convergence  rate  is  very  fast  for  the 
network.  The  performance  of  the  algorithm  is  demonstrated  through  optimal  path  computation  in 
2-D  space. 

3.8.6  Time-Table  Scheduling  -  Yu  (1990) 


Time-table  scheduling  is  perhaps  one  of  the  classical  problems  in  AI  applications.  Yu  (1990)  de¬ 
scribes  the  application  of  a  Hopfield  network  to  the  class  scheduling  in  an  educational  institute.  The 
scheduling  problem  is  basically  a  graph  coloring  or  graph  partitioning  problem.  The  relationship  be¬ 
tween  the  constraints  and  the  time-slots  can  be  represented  by  an  edge-weighted  graph  which  is  very 
similar  to  the  graph  arisen  from  the  decomposition  problem  converting  from  loosely  synchronous 
problems  to  parallel  machines.  The  Simulated  Annealing  algorithm  is  used  with  the  optimization  pro¬ 
cess  to  improve  the  capability  of  the  network  in  escaping  the  local  minimum.  A  problem  of  scheduling 
64  classes  into  two  time-slots  is  solved  with  the  proposed  approach. 
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3.8.7  On  the  Theory  of  Optimization 

Previous  descriptions  on  solving  planning  and  scheduling  problems  with  neural  networks  have 
illustrated  that  these  are  optimization  problems.  There  has  been  a  lot  of  work  on  the  theoretical  analy¬ 
sis  of  using  neural  networks  to  combinatorial  optimization  problems  especially  on  the  Traveling  Sales¬ 
man  Problem.  The  following  list  is  provided  for  the  purpose  of  completeness  of  presentation  and  a 
critical  examination  on  the  theoretical  analysis  will  provide  insight  and  new  direction  on  solving  real 
world  optimization  problems. 
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Cognitive  Research 
Publisher:  Carfax  Publishing 
Address:  Carfax  Publishing  Company 
P.  O.  Box  25 
Abingdon,  Oxfordshire 
0X14  3UE,  UK 

Title:  Concepts  in  NeuroScience 
Publisher:  World  Scientific  Publishing 
Address:  World  Scientific  Publishing  Co. 

687  Hartwell  Street 
Teaneck,  NJ  07666 
Tel:  (201)  837-8858 

Title:  Neurocomputers 
Publisher:  Gallifrey  Publishing 
Address:  Gallifrey  Publishing 
PO  Box  155 

Vicksburg,  Michigan  49097 
Tel:  (616)  649-3772 

Title:  Complex  System 

Publisher:  Complex  Systems  Publications 

Address:  Complex  Systems  Publications,  Inc. 

P.O.  Box  6149 
Champaign,  IL  61821-8149 

Title:  AI  EXPERT 

Publisher:  Miller  Freeman  Publications 
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Address:  500  Howard  St. 

San  Francisco,  CA  94105 
Tfel:  (415)  -  397-1881 

A  popular  AI  magazine  and  self-claimed  as  the  magazine  of  artificial  Intelligence  in  Practice. 
Publishes  introductory  articles  on  neural  networks  such  as  the  series:  “Neural  Networks  Primer” 
and  “Using  Neural  Nets”  by  Maureen  Caudill. 

Title:  N  jural  Network  Review 

Publisher;  Lawrence  Erlbaum  Associates  (LEA) 

Address:  Lawrence  Erlbaum  Associates  Inc. 

365  Broadway 
Hillsdale,  NY  07642 

Review  Journal.  Reviews  of  book,  products,  selected  papers  from  other  journals;  announcements 
for  news  items,  books,  journals,  and  conference  proceedings;  copies  of  table  of  contents  for  sever¬ 
al  journals  and  proceedings.  Accompanying  each  review  article,  it  usually  presents  an  original 
author’s  response. 

Title:  Neural  Network  News 
Publisher:  AlWeek  Inc. 

Address:  Neural  Network  News 

2555  Cumberland  Parkway,  Suite  29, 

Atlanta,  GA  30339 
Tel:  (404)  434-2187 

A  commercial  newsletter.  It  presents  reviews  of  neural  network  conferences,  new  products,  and 
research  activities  in  the  United  States,  Europe,  and  Japan. 
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APPENDIX  B: 

PUBLICLY  AVAILABLE  SIMULATORS  FOR  ARTIFICIAL 
NEURAL  NETWORKS 


This  compiled  listing  on  publicly  available  software  for  neural  networks  simulation  is  based  on 
information  from  Neural  Network  Digest  -  an  electronic  bulletin  board  on  Internet.  Information 
regarding  the  purpose  and  availability  of  each  software  is  not  guaranteed  to  be  completely  correct 
because  the  validity  of  each  ftp  address  has  not  been  verified. 

BPS  -  George  Mason  University  Back  Prop  Simulator 
Current  version  is  1.01  (Nov.,  1989) 

A  special-purpose  simulator  for  backpropagation  and  a  BP  speedup  technique  called  “gradient 
correlation.”  Available  via  anonymous  ftp  from  gmuvax2.gmu.edu  (129.174.1.8).  Distributed  as 
executable  for  VAX  8530  under  Ultrix  3.0,  and  versions  for  8088  based  IBM  PC,  and  80286/386  IBM  PC 
machines.  Includes  examples  and  a  tutorial  document.  Source  code  license  is  available. 

Contact: 

Eugene  Norris 

Computer  Science  Department 
George  Mason  University 
Fairfax,  Virginia  22032 
Email:  enorris@gmuvax2.gmu.edu 
Tel:  (703)323-2713 

MIRRORS/II  —  Maryland  MIRRORS/II  Connectionist  Simulator 

A  general-purpose  connectionist  simulator.  MIRRORS/II  is  implemented  in  Franz  Lisp  and  will  run 
under  Opuses  38, 42,  and  43  of  Franz  Lisp  on  UNIX  systems.  It  is  currently  running  on  a  Micro  VAX, 
VAX  and  SUN  3. 

To  obtain  this  simulator  you  must  sign  an  institutional  site  license.  A  license  for  individuals  is  not 
acceptable.  The  only  costs  incurred  are  for  postage  for  a  printed  copy  of  the  manual  and  tape  cartridge 
(you  send  your  own  1/4”  cartridge  or  TK50  cartridge  to  them,  if  desired.)  Instructions  for  obtaining  the 
software  via  ftp  are  returned  to  you  upon  receipt  of  the  license  agreement.  To  obtain  a  copy  of  the 
license  send  your  U.  S.  Mail  address  via  e-mail  to:  mirrors@cs.umd.edu. 

Or  by  U.S.  Mail  to: 

Lynne  D’Autrechy 
University  of  Maryland 
Department  of  Computer  Science 
College  Park,  MD  20742 
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NeurDS  -  The  Neural  Design  and  Simulation  System. 

Current  Version  is  3.1  (May,  1989) 

A  general  purpose  simulator.  The  system  is  licensed  on  a  no-fee  basis  to  educational  institutions 
by  Digital  Equipment  Corporation.  Tb  obtain  information,  send  your  U.  S.  or  electronic  mail  address 
to: 


Max  McClanahan 

Digital  Equipment  Corporation 

1175  Chapel  Hills  Drive 

Colorado  Springs,  Colorado  80920-3952 

Email:  mcclanahan%cookie.dec.com@decwrl.dec.com 

You  should  receive  instructions  on  how  to  obtain  a  copy  of  the  manual  and  copies  of  the  license 
agreement. 

The  NeurDS  system  will  run  on  any  Digital  platform  including  Vax/VMS,  Vax/Ultrix,  and 
DECsystem/Ultrix.  A  graphics  terminal  is  not  required  to  support  the  window  interface.  Specific 
models  are  described  using  a  superset  of  the  C  programming  language,  and  compiled  into  a  simulator 
form.  This  simulator  can  accept  command  scripts  or  interactive  commands.  Output  can  take  the  form 
of  a  window-type  environment  on  VT100  terminals,  or  nonwindow  output  on  any  terminal. 

FULL  —  Fully  connected  temporally  recurrent  neural  networks. 

A  demonstration  network  described  in  “Learning  State  Space  Trajectories  in  Recurrent  Neural 
Networks.” 

The  author  (Barak  Pearlmutter,  bap@f.gp.cs.cmu.edu)  describes  this  as  “a  bare  bones  simulator 
for  temporally  recurrent  neural  networks”  and  claims  that  it  should  vectorize  and  parallelize  well.  It  is 
available  for  ftp  from  doghen.boltz.cs.cmu.edu.  Login  as  “ftpguest”,  password  “aklisp”.  Be  sure  to  ftp 
as  binary  for  the  file  “full/full.tar.Z”  (you  must  either  use  a  directory  named  full  on  your  local  machine, 
or  use  “get”  and  let  it  prompt  you  for  remote  and  local  file  names).  Do  not  attempt  to  change 
directories.  It  is  copyrighted  and  is  given  out  for  academic  purposes. 

GRADSIM  Connectionist  Network  Simulator. 

A  special-purpose  simulator  specifically  designed  for  experiments  with  the  temporal  flow  model. 
Latest  Version  1.7. 

In  C,  implementations  on  VAX  (VMS  &  Ultrix),  Sun,  and  CYBER  are  mentioned.  Includes  an 
excellent  article  with  references.  The  simulator  is  available  for  anonymous  ftp  from  ai.toronto.edu 
(128.100.1.65).  For  information  contact: 

Raymond  Watrous 
Department  of  Computer  Science 
University  of  Toronto 
Tbronto,  Ontario  M5S  1A4 
Email:  watrous@ai.toronto.edu 
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GENESIS  -  GEneral  NEural  Simulation  System  with 

XODUS  -  X-windows  Output  and  Display  Utility  for  Simulations 

A  general  simulator.  Currently  Beta-Test  Version,  1990.  From  the  release  announcement  (January 
1990  by  Jim  Bower 

Full  source  for  the  simulator  is  available  via  ftp  from  genesis.cns.caltech.edu  (131.215.135.64).  To 
acquire  FTP  access  to  this  machine  it  is  necessary  to  First  register  for  distribution  by  using  telnet  or 
rlogin  to  log  in  under  user  “genesis”  and  then  follow  the  instructions.  When  necessary,  tapes  can  be 
provided  for  a  handling  fee  of  US$50.  Those  requiring  tapes  should  send  requests  to 
genesis-req@caltech.bitnet.  Any  other  questions  about  the  system  or  its  distribution  should  also  be 
sent  to  this  address.  GENESIS  and  XODUS  are  written  in  C  and  run  on  SUN  and  DEC  graphics  work 
stations  under  UNIX  (version  4.0  and  up),  and  X-windows  (version  11).  The  software  requires  14  meg 
of  disk  space  and  the  tar  file  is  approximately  1  meg. 

The  current  distribution  includes  full  source  for  both  GENESIS  and  XODUS  as  well  as  three 
tutorial  simulations  (squid  axon,  multicell,  visual  cortex).  Documentation  for  these  tutorials  as  well  as 
three  papers  describing  the  structure  of  the  simulator  are  also  included.  As  described  in  more  detail  in 
the  “readme”  file  at  the  ftp  address,  those  interested  in  developing  new  GENESIS  applications  are 
encouraged  to  become  registered  members  of  the  GENESIS  users  group  (BABEL)  for  an  additional 
one  time  $200  registration  fee.  As  a  registered  user,  one  is  provided  documentation  on  the  simulator 
itself,  access  to  additional  simulator  components,  bug  report  listings,  and  access  to  a  user’s  bulletin 
board. 

SunNet 

A  generalized  simulator.  Version  5.5.2.4  currently. 

Available  for  anonymous  ftp  from  boulder.colorado.edu  (128.138.240.1).  While  this  program  was 
obviously  written  for  Sun  workstations  (versions  for  Suntools  and  the  X-window  environment),  the 
documents  list  other  configurations.  These  include  a  nongraphic  version  which  runs  on  “any  UNIX 
machine,”  and  versions  which  run  on  an  Alliant  or  UNIX  machine  and  send  data  to  a  graphics  support 
program  running  on  a  Sun  workstation.  It  is  very  ;asy  to  install.  A  mailing  list  exists  for  users  of  the 
simulator. 

RCS  -  The  Rochester  Connectionist  Simulator 
A  general  simulator.  Version  4.2  currently. 

Available  for  anonymous  ftp  from  cs.rochester.edu  (192.5.53.209).  Tapes  may  be  purchased  (1600 
bpi  1/2”  reel  or  QIC-24  Sun  1/4”  cartridge)  from: 

Peg  Meeker 

Computer  Science  Department 
University  of  Rochester 
Rochester,  New  York  14627 

C  source  code  is  provided,  including  a  graphic  interface  which  may  function  under  X  Windows  or 
SunView  on  Sun  Workstations.  A  wide  variety  of  Unix  machines  are  supported,  and  the  simulator  may 
be  used  without  the  graphics  interface.  A  version  for  the  Macintosh  is  included  in  the  distribution. 
Mailing  lists  exist  for  users  and  bug  reports. 
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SFINX  —  Structure  and  Function  in  Neural  Connections 
A  General  Simulator.  Version  2.0  (November  1989) 

In  order  to  ftp  this  simulator,  a  license  agreement  must  be  submitted.  Upon  receipt  of  this 
agreement,  instructions  and  the  password  to  ftp  the  software  are  made  available.  1 b  obtain  the  license 
write: 


Machine  Perception  Laboratory 
Computer  Science  Department 
University  of  California 
Los  Angeles,  CA  90024 

This  system  requires  color  to  operate  the  graphics  interface,  but  may  be  operated  without 
graphics.  Support  for  Sun,  Ardent  Titan,  HP  300,  and  IBM  PC  RT  machines  is  specifically  mentioned, 
but  other  Unix  platforms  should  function  as  well.  Specific  graphics  support  is  provided  for  Matrox 
VIP  1024,  Imagraph  AGC-1010P,  HP  Starbase  and  X  Windows. 

Mactivation 

A  specialized  simulator  for  investigating  associative  memory  using  the  delta  rule  and  Hebbian 
Learning.  Version  3.3  currently. 

A  public  domain  version  is  available  for  anonymous  ftp  from  the  University  of  Colorado  at 
Boulder  (boulder.colorado.edu,  128.138.240.1)  or  possibly  by  contacting  the  author. 

Mike  Kranzdorf 

University  of  Colorado 

Optoelectronic  Computing  Systems  Center 

Campus  Box  525 

Boulder,  Colorado  80309-0525 

Email:  mikek@boulder.colorado.edu 

Future  versions  will  probably  not  be  public  domain,  but  will  be  available  from  Oblio,  Inc.,  5942 
Sugarloaf  Road,  Boulder,  Colorado  80309.  Provided  as  executable  for  the  Apple  Macintosh. 

PDP  Simulators 

Several  special  purpose  simulators  are  provided  with  the  following  book: 

McClelland,  J.  L.,  and  David.E.  Rumelhart,  Explorations  in  Parallel  Distributed  Processing,  Vol.  Ill, 
Cambrige:  MIT  Press,  1988. 

The  simulators  were  written  in  C,  and  versions  for  both  the  IBM  PC  and  the  Macintosh  exist. 
Ilopfield— style  Network  Simulator 

A  Special  Purpose  simulator  for  experimentation  with  the  Hopfield-style  network. 

Software  is  available  by  e-mail  upon  request  from  the  author,  Arun  Jagota.  It  is  written  in  C  and 
should  be  useful  on  32-bit  Unix  machines,  and  a  MSDOS  version  is  also  supplied.  Arun’s  email 
address  is  jagota@cs.buffalo.edu. 
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APPENDIX  C: 

NEURAL  NETWORK  BOOKS  AND  PROCEEDINGS 


This  section  lists  some  of  the  well  known  publications  on  neural  networks,  connectionist  systems, 
computational  psychology,  genetic  algorithms,  vision  and  perception,  and  proceedings  of  conferences 
on  neural  networks,  to  give  a  historical  perspective  on  the  development  of  neural  network  research,  to 
provide  fundamental  materials  for  beginners  embarking  on  this  field,  and  to  provide  researchers  with 
the  state-of-the-art  publications  from  recent  conferences  dedicated  to  the  research  and  application  of 
neural  networks. 

1.  Proceedings  of  the  IEEE  First  International  Conference  on  Neural  Networks,  IEEE,  New  York,  June 
1987. 

2.  Proceedings  of  the  IEEE  International  Conference  on  Neural  Networks,  IEEE,  New  York,  June  1988. 

3.  Proceedings  of  the  1988  Connectionist  Models  Summer  School,  D.  Tburetzsky,  G.  Hinton,  and  T 
Sejnowski,  Eds.,  Carnegie  Mellon  University,  Morgan  Kaufmann  Publishers,  San  Mateo,  CA, 
1989. 

4.  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE  and 
the  International  Neural  Network  Society,  Washington,  D.  C.,  1989. 

5.  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE  and 
the  International  Neural  Network  Society,  Washington,  D.  C.,  1990. 

6.  Proceedings  of  the  International  Joint  Conference  on  Neural  Networks,  Co-sponsored  by  IEEE  and 
the  International  Neural  Network  Society,  San  Diego,  1990. 

7.  Proceedings  of  the  First  International  Conference  on  Genetic  Algorithms,  J.  J.  Grefenstettee  (Ed.), 
Lawrence  Erlbaum  Publishers,  Hillsdale,  NJ,  1987. 

8.  Proceedings  of  the  Second  International  Conference  on  Genetic  Algorithms,  J.  J.  Grefenstettee, 
(Ed.),  Lawrence  Erlbaum  Publishers,  Hillsdale,  NJ,  1988. 

9.  Proceedings  of  the  Third  International  Conference  on  Genetic  Algorithms,  J.  D.  Schaffer  (Ed.), 
Morgan  Kaufmann  Publishers,  San  Mateo,  CA,  1990. 

10.  Ackley,  D.,  A  Connectionist  Machine  for  Genetic  Hillclimbing,  Kluwer  Academic  Publishers,  1987. 

11.  Aleksander,  I.  (Ed.),  Neural  Computing  Architecture,  The  MIT  Press,  Cambridge,  MA,  1989. 

12.  Amari,  S.  I.,  and  Arbib,  M.  (Eds.),  Competition  and  Cooperation  in  Neural  Networks, 
Springer- Verlag,  New  York,  1982. 

13.  Anderson,  J.  and  Lehmkuhle,  S.  (Eds.),  Synaptic  Modification,  Neuron  Selectivity,  and  Nervous 
System  Organization,  Lawrence  Erlbaum,  Hillsdale,  NJ,  1985. 

14.  Anderson,  D.  Z.  (Ed.),  Neural  Information  Processing  Systems,  American  Institute  of  Physics,  1988. 

15.  Arbib,  M.  A.,  Caplan,  D.,  and  Marshall,  J.  C.  (Eds.),  Neural  Models  of  Language  processing,  The 
Academic  Press,  N  'w  York,  1982. 
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16.  Arbib,  M.  A.,  Brains,  Machines  and  Mathematics ,  2nd  edition,  Springer- Verlag,  New  York,  1987. 

17.  Basar,  E.,  Flohr,  H.,  Haken,  H.,  and  Mandel,  A.  J.  (Eds.),  Synergetics  of  the  Brain,  Springer- Verlag, 
New  York,  1983. 

18.  Beck,  J.,  Hope,  B.,  and  Rosenfeld,  A.  (Eds.),  Human  and  Machine  Vision,  Academic  Press,  New 
York,  1983. 

19.  Caianiello,  E.  R.  (Ed.),  Parallel  Architectures  and  Neural  Networks,  World  Scientific,  Singapore, 
1989. 

20.  Carbonell,  J.  G.  (Ed.),  Machine  Learning:  Paradigms  and  Methods,  The  MIT  Press,  Cambridge, 
Massachusetts,  1990. 

21.  Casti,J.  L.,  Alternate  Realities:  Mathematical  Models  of  Nature  and  Man,  Wiley  Interscience, 

1989. 

22.  Caudill,  M.,  and  Butler,  C.,  Naturally  Intelligent  Systems,  The  MIT  Press,  Cambridge, 
Massachusets,  1990. 

23.  Commons,  M.  L.,  Grossberg,  S.,  and  Staddon,  J.  E.  R.  (Eds.),  Neural  Network  Models  of 
Conditioning  and  Action,  Lawrence  Erlbaum,  Hillsdale,  NJ,  1991. 

24.  Cornsweet,  T  N.,  Visual  Perception,  Academic  Press,  New  York,  1970. 

25.  Davis,  L.  (Ed.),  Genetic  Algorithms  and  Simulated  Annealing,  Morgan  Kaufmann  Publishers,  Los 
altos,  CA,  1987. 

26.  Davis,  J.,  Newburgh,  R.,  and  Wegman,  E.  (Eds.),  Brain  Structure,  Leaning,  and  Memory,  AAAS 
Symposium  Series,  1987. 

27.  Dayhoff,  J.  E.,  Neural  Network  Architectures :  An  Introduction,  Van  Nostrand  Reinhold,  New  York, 

1990. 

28.  Denker,  J.  S.  (Ed.),  Neural  Networks  for  Computing,  American  Institute  of  Physics,  1988. 

29.  Durbin,  R.,  Miall,  C„  and  Mitchson,  G.,  The  Computing  Neuron,  The  Academic  Press,  New  York, 
1989. 

30.  Eberhart,  R.  C.,  and  Dobbins,  R.  W.  (Eds.),  Neural  Network  PC  Tools,  The  Academic  Press,  New 
York,  1990. 

31.  Eckmiller,  R.,  Hartmann,  G.,  and  Hauske,  G.  (Eds.),  Parallel  Precessing  in  Neural  Systems  and 
Computers,  North-Holland,  Amsterdam,  The  Netherlands,  1990. 

32.  Freeman,  J.,  and  Skapura,  D.,  Artificial  Neural  Systems;  Theory  and  Practices,  The  Academic  Press, 
New  York,  1991. 

33.  Freeman,  W.  J.,  Mass  Action  in  the  Nervous  System,  The  Academic  Press,  New  York,  1975. 

34.  Garner,  W.  R.,  The  Processing  of  Information  and  Structure,  Lawrence  Erlbaum  associates,  Inc., 
Hillsdale,  NJ,  1974. 

35.  Goldberg,  D.  E.,  Genetic  Algorithms  in  Search,  Optimization,  and  Machine  Learning, 
Addison- Wesley  Publishers,  New  York,  1989. 
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36.  Grossberg,  S.  (Ed.),  Mathematical  Psychology  and  Psychophysiology,  American  Mathematical 
society,  Providence,  RI,  1981. 

37.  Grossberg,  S.,  Studies  of  Mind  and  Brain:  Neural  Principles  of  Learning,  Perception,  Development, 
Cognition  and  Motor  Control,  Reidel  Press,  Boston,  1982. 

38.  Grossberg,  S.,  and  Kuperstein,  Neural  Dynamics  of  Adaptive  Sensory  Motor  Control:  Ballistic  Eye 
Movements,  North-Holland,  Amsterdam,  1986. 

39.  Grossberg,  S.,  The  Adaptive  Brain,  I:  Cognition,  Learning,  Reinforcement  and  Rhythm, 
North-Holland,  Amsterdam,  1987. 

40.  Grossberg,  S.,  The  Adaptive  Brain,  II:  Vision,  Speech,  Language,  and  Motor  Control, 
North-Holland,  Amsterdam,  1987. 

41.  Grossberg,  S.  (Ed.).  Neural  Networks  and  Natural  Intelligence,  The  MIT  Press,  Cambridge, 
Massachusets,  1988. 

42.  Hawkins,  R.  D.,  and  Bower,  G.  H.  (Ed.),  Computational  Models  of  Learning  in  Simple  Neural 
Systems,  The  Academic  Press,  New  York,  1989. 

43.  Hebb,  D.  O.,  The  Organization  of  the  Behavior,  John  Wiley  and  Sons,  New  York,  1949. 

44.  Hertz,  J.,  Krogh,  A.,  and  Palmer,  R.,  Introduction  to  the  Theory  of  Neural  Computation,  The 
Academic  Press,  New  York,  1990. 

45.  Hinton,  G.  E.,  and  Anderson,  J.  A.  (Eds.),  Parallel  Models  of  Associative  Memory,  Lawrence 
Erlbaum,  Hillsdale,  NJ,  1981. 

46.  Janko,  W.  H.,  Roubens,  M.,  and  Zimmermann,  H.-J.  (Eds.),  Progress  in  Fuzzy  Sets  and  Systems, 
Theory  And  Decision  Library,  Kluwer  Academic  Publishers,  The  Netherlands,  1990. 

47.  Khanna,  T,  Foundations  of  Neural  Networks,  Addison- Wesley,  New  York,  1990. 

48.  Koch,  C.  (Ed.),  Computation  and  Neural  Systems,  The  Academic  Press,  New  York,  1990. 

49.  Kohonen,  T,  Associative  Memory:  A  System  Theoretical  Approach,  Springer- Verlag,  New  York, 
1977. 

50.  Kohonen,  T,  Self-Organization  and  Associative  Memory,  Springer- Verlag,  New  York,  1984. 

51.  Levine,  D.  S„  Introduction  to  Neural  and  Cognitive  Modeling,  Lawrence  Erlbaum,  Hillsdale,  NJ, 
1990. 

52.  Levine,  D.  S.,  and  Levcn,  S.  J.  (Eds.),  Motivation,  Emotion,  and  Goal  Direction  in  Neural  Networks, 
Lawrence  Erlbaum,  Hillsdale,  NJ,  1990. 

53.  MacGregor,  R.  J.,  Neural  and  Brain  Modeling,  The  Academic  Press,  New  York,  1987. 

54.  Marr,  D.,  Vision,  San  Francisco,  W.  H.  Freeman,  1982. 

55.  Mead,  C.,  Analog  VLSI  and  Neural  Systems,  The  Academic  Press,  New  York,  1989. 

56.  Mel,  B.,  Connectionist  Robot  Motion  Planning,  The  Academic  Press,  New  York,  1990. 

57.  Minsky,  M.,  and  Papert,  S.,  Peceptrons:  An  Introduction  To  Computational  Geometry,  The  MIT 
Press,  Cambridge,  Massachusetts,  1969,  and  1988. 
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